1 Introduction

Battery packs are widely used in many important areas, such as electric vehicles (EVs), plug-in electric vehicles (PHEVs), smart grids, and aerospace [1]. A battery pack consists of hundreds of battery cells connected in series and parallel, which makes it difficult to manage [2]. Due to inconsistencies (variation of the cells) in production, packaging, and usage, the state of health (SOH) of a battery pack deteriorates faster than a single-battery cell, making it hard to estimate [3]. Therefore, the inconsistency evaluation and SOH estimation of battery packs are drawing increasing attention.

The inconsistencies of battery packs mainly include internal and external parameter inconsistencies [4]. Internal parameters such as capacity, resistance, open-circuit voltage (OCV), and state of charge (SOC) may have inconsistencies due to the complexity of the manufacturing and packaging process [5, 6]. During operation, other external parameters including current and temperature would cause further inconsistencies [7, 8]. The methods for inconsistency evaluation can be divided into three categories: signal processing methods, model-based methods, and feature fusion-based methods [4]. The signal-processing methods evaluate inconsistency through direct detection of parameters such as time-domain voltage [9] and frequency domain electrochemical impedance spectrum (EIS) [10]. Since the measurable parameters are limited and EIS measurement needs to be done offline, researchers proposed to use models to detect the parameter inconsistency. Chen et al. [11] detected the internal resistance inconsistency through parameter identification of the equivalent circuit models (ECMs). Zheng et al. [12] and Hu et al. [13] established the mean-plus-difference model for the series battery pack and adopted an extended Kalman filter to detect SOC differences among battery cells. In recent years, the feature fusion-based methods have drawn increasing attention. Duan et al. [14] used information entropy to analyze the inconsistency of a battery pack with twelve cells connected in series. The capacity, internal resistance, and coulomb efficiency were selected as features. Tian et al. [15] extracted OCV, ohm resistance, and polarization resistance as features; entropy weights were distributed to assess the inconsistency. However, most of the above features require parameter identification or large amount of test data, which increases the computational cost and storage cost. In addition, some features are extracted from the processed curves, which could possibly lose accuracy. Thus, it is more efficient to use the features extracted directly from onboard measurements to evaluate the inconsistency. What’s more, the inconsistency hasn’t been used in the SOH estimation, although it is an important affecting factor.

The SOH is a significant index evaluating the lifespan of batteries. Many studies have reported various methods to estimate battery SOH. The existing estimation methods can be divided into two categories: model-based methods and data-driven methods [16, 17]. Battery models, including ECMs [18, 19] and electrochemical models [20], can be used to estimate the characteristic parameters that are highly related to SOH. Kim et al. [21] established an ECM and used a dual extended Kalman filter to jointly estimate SOC and SOH. Bi et al. [22] simplified the ECM for battery packs, then the internal resistance was identified to estimate the SOH, and the genetic resampling particle filter was used to solve the non-gaussian problem. ECM is simple in structure and has a low computational cost, although its accuracy is low due to the approximation. Hence, the electrochemical models such as the single-particle model (SPM) and pseudo-two-dimensional (P2D) model were used for SOH estimation [23]. Prasad et al. [24] developed and simplified a SPM based on two aging parameters, then used the least square and recursive parameter estimators for SOH estimation. Hu et al. [25] established a reduced SPM and created a multiscale moving horizon estimation scheme to fulfill SOH monitoring through estimating the battery capacity and resistance. Though electrochemical models have good accuracy, they need a large amount of calculation time. Therefore, data-driven methods have been developed to solve this problem [26]. Yang et al. [27] extracted health indicators (HIs) from the charging curve, used gray relational analysis to evaluate the relational degree, and applied Gaussian progress regression (GPR) to estimate the SOH. Through incremental capacity (IC) and differential voltage (DV) analysis, more aging features could be extracted [28, 29]. Weng et al. [30] tracked the peak values of the IC curve by support vector regression and estimated the battery pack SOH. Berecibar et al. [31] estimated the SOH by tracking the location change of valley values in the DV curve. In addition, researchers also extracted HIs from the frequency domain [32], such as sample entropy [33] and mechanical parameters [34] to estimate SOH. The above methods mainly focus on the SOH estimation of battery cells, and HIs are mostly not simply extracted. However, the SOH of battery packs is seldom studied, especially considering the inconsistency. In the existing studies, only the correlation analysis is used for feature selection. However, some important HIs are removed, while other redundant HIs are retained, which will reduce the estimation accuracy and reliability [35].

This paper proposes a joint estimation method of inconsistency and SOH for series-connected battery packs. The main contributions are summarized as follows. First, the inconsistency features are extracted from current change points in the charging process, thereby reducing the computational cost, which supports both inconsistency evaluation and SOH estimation. Second, the hierarchy weights and multi-scale entropy weights are fused to allocate fusion weights to each feature for the inconsistency evaluation, and the results reflect the aging status of the battery packs. An inconsistency reference table is proposed based on the correlation between inconsistency and SOH. Third, the wrapper method is used for feature selection to remove the redundant features. More accurate and reliable SOH estimation results are obtained. The estimation remains accurate even only 10% of the data is used for model training. Fourth, a joint estimation system of inconsistency and SOH is designed with the potentials of onboard application.

This paper is structured as below. In Sect. 2, the test and data acquisition are introduced, and then the feature extraction method is described. Next, in Section 3, the joint estimation system of inconsistency and SOH is designed. After that, the estimation results are given in Sect. 4. Finally, the main conclusions are given in Sect. 5.

2 Data Acquisition and Feature Extraction

Tests are carried out to collect the test data of aging battery packs. The test platform is shown in Fig. 1a, which includes a digatron battery tester, a data logger, a thermal chamber, a computer, and a series-connected battery pack. The ‘Digatron’ battery tester is used to charge and discharge the battery pack. The data logger is used to gather the cell voltage and temperature; the thermal chamber is used to keep the temperature constant; the computer is used to control the test process and store the test data. The sample period and chamber temperature are set to 1 min and 25 °C, respectively. The series-connected battery pack consists of four squared battery cells, and the nominal capacity is 177 A·h. The cathode and anode are Li(Ni0.8Co0.1Mn0.1)O2 and graphite, respectively, and the upper and lower cutoff voltage of battery cells is 4.2 V and 2.8 V, respectively. A five-stage constant current charging and constant current discharge strategies are used for the aging cycles according to the real conditions in EVs, which is shown in Fig. 1b.

Fig. 1
figure 1

Aging test design for the battery pack

Feature extraction is the foundation and preprocess for the inconsistency and SOH estimation. One simple but useful feature could make the estimation more accurate and reliable with a low computational cost. In EVs/PHEVs, it is rare to deplete the battery pack completely. Therefore, the initial charge points are not constant. The first three current change points during the charging process are selected for feature extraction. The SOC of this segment is about 40%–80%, which coincides with the range of most onboard charge processes [36]. The features are extracted from the current change points, with no model needed and a simple calculation process. Moreover, it is not necessary to confirm the initial charging point. The voltage inconsistency will cause the battery pack voltage at the current switch points to decrease during the aging process. In the test strategy, there is a current switch at the change point, which causes a voltage drop correspondingly, through which the internal resistance could be obtained. Thus, these natural and simple features are extracted as follows:

  • F11–F31: Battery cells peak voltage range at the first three current change points;

  • F12–F32: Battery cells voltage drop range at the first three current change points;

  • F13–F33: Battery cells peak voltage standard deviation at the first three current change points;

  • F14–F34: Battery cell voltages drop standard deviation at the first three current change points,

  • F15–F35: Battery pack peak voltages at the first three current change points;


where the first subscript denotes the current change point, and the second subscript denotes the feature number. There are fifteen features extracted from the voltage curve, which represent the voltage and internal resistance inconsistency in the battery pack. And, just the voltage data at the current change points are used for feature extraction, which makes the extraction method very simple and easy to be applied in the battery management system (BMS).

3 Methodology

The extracted features which reflect the inconsistency of the battery pack are described in the above section. After the features are extracted from the partial charging process, they are then used for the estimation of inconsistency and SOH. In this section, the algorithms for joint inconsistency and SOH estimation are introduced. Firstly, in Sect. 3.1 the battery pack inconsistency evaluation method is introduced. Then, the methodology for SOH estimation is proposed in Sect. 3.2. At last, the proposed joint estimation system is designed in Sect. 3.3.

3.1 Method for Inconsistency Evaluation

Different features have different contributions to the inconsistency, so a fusion weight distribution method is proposed. The hierarchical weight and multi-scale entropy weight are introduced in Sects. 3.1.1 and 3.1.2, respectively. Then, a fusion weight method is proposed in Sect. 3.1.3.

3.1.1 Analytic Hierarchy Process

The analytic hierarchy process (AHP) is one of the most widely used multiple criteria decision-making tools. The decision goal is divided into many indexes, according to which the weights are allocated. A detailed description of AHP can be found in Ref. [37]. The inconsistency evaluation is a multiple index decision-making process. Therefore, the AHP is used to allocate subjective weights.

The hierarchy for the inconsistency evaluation is shown in Fig. 2.

Fig. 2
figure 2

The hierarchy indicators for inconsistency evaluation

First, the inconsistency is caused by the three series of indicators, which represent the features extracted from the first three current change points. At each current change point, there are three indicators, including range, standard deviation, and sum, to characterize the inconsistency. Then, at the bottom of the hierarchy structure, the range and standard deviation are represented by the statistic of the battery cell voltages and voltage drops, which denote the voltage inconsistency and internal resistance inconsistency, respectively. The sum value is calculated by the battery pack voltage. The weights from the second level to the bottom are allocated. First, due to the long charge time of large currents, and great current drops of the first two constant current charge period and the SOC at the first and second current change point is about 40% and 70%, respectively, which are the middle range. While the third charge period is in the high SOC range, and the charge time is short. Therefore, the weights of these three points are set to 0.4, 0.4, and 0.2. At the third level, three indicators are included, the range and standard deviation are calculated by the parameters of the battery cells, which directly reflect the voltage and internal resistance inconsistencies among battery cells. And, the standard deviation involves each battery property, while the range reflects the difference between the maximum and the minimum values. The battery pack voltage is measured by the pack terminal voltage, which is affected by the contact resistance, and only one indicator belongs to this category. Thus, the weights of the range, standard deviation, and sum are set to 0.4, 0.5, and 0.1, respectively. Finally, because the voltage inconsistency is reflected by cell voltages, and the internal resistance inconsistency is represented by cell voltage drops, the weights of voltage and voltage drop are set to 0.5 and 0.5, respectively. The total weights are listed in Table 1.

Table 1 Hierarchy weights of the features (w1).

3.1.2 Multi-Scale Entropy Weight

The AHP is based on subjective experience, which lacks the objective information of the features. On the contrary, sample entropy (SampEn) is an objective method to describe the characteristics of information distribution and discrete characteristics quantitatively and is widely used in a comprehensive evaluation of randomness and chaos [14]. The SampEn is the exact value of the negative average natural logarithm of the conditional probability and has good consistency [33]. Given a time series of length N, \(\left\{ {x_{i} :1 < i < N} \right\}\), the calculation is denoted as follows [33, 38].

Preset the embedded dimension m, reconstruct the N-m+1 vectors.

$$ {\varvec{X}}_{m} (i) = \left\{ {x(i + l):0 \le l \le m - 1} \right\},\quad i = 1,...,N - m + 1 $$
(1)

The distance between two such vectors is defined as the maximum absolute difference.

$$ d[{\varvec{x}}_{m} (i),{\varvec{x}}_{m} (j)] = {\text{max}}\left\{ {\left| {x(i + l) - x(i + l)} \right|:0 \le l \le m - 1} \right\} $$
(2)

Given the similar tolerance r, calculate the distance \(d\left[ {{\varvec{X}}_{m} (i),\;{\varvec{X}}_{m} (j)} \right]\) between \({\varvec{X}}_{m} (i)\) and \({\varvec{X}}_{m} (j)(j = 1,2...,N - m,j \ne i)\). Define

$$ B_{i}^{m} (r) = \frac{{C_{i}^{m} (r)}}{N - m - 1}, \, 1 \le i \le N - m,\;i \ne j $$
(3)
$$ B_{i}^{m + 1} (r) = \frac{{C_{i}^{m + 1} (r)}}{N - m - 1},{ 1} \le i \le N - m,i \ne j $$
(4)

where the \(C_{i}^{m} (r)\) is the number of vectors \({\varvec{X}}_{m} (j)\) that \(d[{\varvec{X}}_{m} (i),{\varvec{X}}_{m} (j)] < r\), for \({1} \le j \le N - m,j \ne i\). Similarly, the \(C_{i}^{m + 1} (r)\) is the number of vectors \({\varvec{X}}_{m + 1} (j)\) that \(d[{\varvec{X}}_{m + 1} (i),{\varvec{X}}_{m + 1} (j)] < r\), for \({1} \le j \le N - m,j \ne i\).

Define the mean of N-m, \(B_{i}^{m} (r)\) and \(B_{i}^{m + 1} (r)\) as:

$$ B^{m} (r) = \frac{1}{N - m}\sum\limits_{i = 1}^{N - m} {B_{i}^{m} (r)} $$
(5)
$$ B^{m + 1} (r) = \frac{1}{N - m}\sum\limits_{i = 1}^{N - m} {B_{i}^{m + 1} (r)} $$
(6)

Finally, the SampEn is defined as:

$$ {\text{SampEn}}(m,r,N) = - {\text{ln}}\left[ {\frac{{B^{m + 1} (r)}}{{B^{m} (r)}}} \right] $$
(7)

According to Eq. (7), the SampEn could restrain the noise whose amplitude is smaller than r. And, when a large disturbance appears, it would be removed by threshold detection. The SampEn describes the complexity of the original sequence from a single scale. When the complexity is high, the SampEn is large, which stands true otherwise. However, for many sequences, it would lose much important information if only a single scale is adopted. Therefore, to describe the complexity more accurately, Costa et al. [39] proposed multi-scale SampEn (MSE). First, coarse-grained segmentation of the original sequence forms the coarse-grained vector \(y_{k}^{\tau }\):

$$ y_{k}^{\tau } = \frac{1}{\tau }\sum\limits_{i = (k - 1)\tau + 1}^{k\tau } {x_{i} } , \, \;1 \le k \le \frac{N}{\tau } $$
(8)

where \(\tau = 1,2...,n\) is the scale factor, N is a positive integer. Then, calculate the SampEn of each coarse-grained vector. The MSE of x can be represented as the SampEn of \(y_{k}^{\tau }\):

$$ {\text{MSE}}(x,t,m,r) = {\text{SampEn}}(y_{k}^{\tau } ,m,r) $$
(9)

where r = 0.2Std (Std is the standard deviation of the original vector x); m = 2; τ = 5.

For the weight allocation of the inconsistency features, the MSE could be normalized as:

$$ w2_{ij} = \frac{{1 - {\text{MSE}}_{ij} }}{{\sum\nolimits_{j = 1}^{n} {\left( {1 - {\text{MSE}}_{ij} } \right)} }} $$
(10)

where w2ij denotes the MSE weight of the jth feature at the ith sample point.

3.1.3 Fusion Weight Assignment

The hierarchy weight and MSE weight represent the subjective and objective weight allocation, respectively. The weights of evaluation indexes determined by the MSE are entirely based on the relationship of data. But sometimes, the objective weights are different from reality. However, the weights determined by AHP are obtained from experts’ experience. But, it usually ignores the data information. So, a more reasonable way to measure weight value should be by AHP (subjective weights) and the MSE method (objective weights) together [40]. This paper gives fused weights of each feature in order to consider both methods, achieving a more comprehensive inconsistency evaluation.

Specifically, the features extracted from the partial charging process are used to form a 15-dimensional initial data matrix.

$$ {\varvec{U}}_{i} = \left[ {u_{i1} \, u_{i2} \, ... \, u_{i15} } \right] $$
(11)

where u is the feature value and i denotes the ith sample point. Then, they should be normalized. However, in order to support onboard utilization, the maximum and the minimum values of each feature during the aging process are unpredictable. Therefore, the normalization processes for the range, and the standard deviation is set as \(x_{ij} = {{u_{ij} } \mathord{\left/ {\vphantom {{u_{ij} } {u_{1j} }}} \right. \kern-\nulldelimiterspace} {u_{1j} }}\), and the sum is normalized as \(x_{ij} = {{u_{1j} } \mathord{\left/ {\vphantom {{u_{1j} } {u_{ij} }}} \right. \kern-\nulldelimiterspace} {u_{ij} }}\). The normalized features form the input matrix.

$$ {\varvec{X}}_{i} = \left[ {x_{i1} \, x_{i2} \, ... \, x_{i15} } \right] $$
(12)

Next, assign the hierarchy weight w1 according to Table 1, and allocate the MSE weight w2 referring to Eqs (110). After that, the fused weight is obtained by the weighted sum.

$$ w_{ij} = \alpha w1_{ij} + (1 - \alpha )w2_{ij} $$
(13)

where the \(\alpha\) is the fusion weight of the hierarchy weight and \((1 - \alpha )\) denotes the fusion weight of MSE weight. The MSE weight reflects more objective information, while the hierarchy weight considers the importance of each feature. So, \(\alpha\) is set to \(0.4\). Finally, the inconsistency evaluation index \(\xi\) of the battery pack is denoted by

$$ \xi_{i} = \sum\limits_{j = 1}^{n} {w_{ij} } x_{ij} $$
(14)

And, the inconsistency is described as the ratio of the current index value to the initial value:

$$ {\text{Incon}}(i) = \frac{{\xi_{i} }}{{\xi_{0} }} $$
(15)

The complete process of inconsistency evaluation based on fusion weights is listed in Table 2.

Table 2 The process of battery pack inconsistency evaluation

3.2 Method for SOH Estimation

Data-driven SOH estimation is advantageous as it is model-free, accurate, and robust. The GPR is applied for battery pack SOH estimation. In this section, the wrapper feature selection method is introduced and the GPR algorithm designed.

3.2.1 Feature Selection

Feature selection is a significant preprocess before machine learning to remove the unimportant and redundant features, which helps reduce the computational cost and get more accurate and reliable results. In the existing papers for battery SOH estimation, only correlation analysis is used to filter the features that has low correlation coefficients with battery capacity. However, some vital features may be removed and some redundant features left [35]. Therefore, in this paper, the wrapper method is adopted to select the extracted features.

The wrapper uses learning algorithms to evaluate the prediction quality of the selected feature subset. Typically, a search process is predefined in the space of possible feature subsets, generating various feature subsets to be evaluated [35]. The general process is select a subset, evaluate the subset according to the prediction performance, select another new subset, and continue to evaluate until the expected quality is reached. The search methods can be divided into three categories: complete search, sequence search, and random search [41]. The sequence search has high efficiency and can be divided into three categories: sequence forward search (SFS), sequence backward search (SBS), and two-way search (TWS) [42]. The SFS methods start with a feature and then evaluates the results with each additional feature until the results begin to deteriorate. The SBS contains all the features, reducing one feature per iteration until the results reach a preset threshold, and the search is stopped. The TWS methods combine the SFS and SBS until the preset threshold is reached. In this paper, the SBS is used for the subset searching.

The SBS searching process for SOH features is shown in Fig. 3. Firstly, the extracted features and the evaluated inconsistency are used to form a 16-dimensional initial input set (including the inconsistency estimation result) and evaluate the initial feature set by the root mean square (RMS) error (obtained by the trained model of the training data). Then, get into the first cycle, remove one feature, and evaluate the subset by training the regression model and calculating the RMS error. Next, find the minimum RMS error of this cycle. After that, continue the next search process if the subset is better than the former one. Otherwise, stop the cycle and determine the final subset. Finally, the selected subset is used for SOH estimation model training and testing. The feature evaluation method uses the same algorithm as the SOH regression model training, which is denoted in the following subsection.

Fig. 3
figure 3

The SOH feature subset selection process by wrapper

3.2.2 GPR Algorithm

The GPR-based regression model has many advantages, such as good flexibility and probabilistic prediction. It is therefore used here to estimate the battery pack SOH. The detailed description of GPR could refer to Ref. [43, 44].

Generally, the noise is assumed to be additive, independent, and Gaussian; therefore the relationship between input feature \(x\) and estimation \(y\) is given as follows:

$$ y = f(x) + \varepsilon ,\quad \varepsilon \sim {N}(0, \, \sigma_{n}^{2} ) $$
(16)

where \(\varepsilon\) is the white noise with a variance of \(\sigma_{n}^{2}\). \(f(x)\) is a latent function, which has a probability distribution as follows:

$$ f(x)\sim {GP}(m(x),k(x,x^{\prime})) $$
(17)

where \(m(x)\) is the mean function, and \(k(x,x^{\prime})\) is the covariance function.

$$ m(x) = E[f(x)] $$
(18)
$$ k(x,x^{\prime } ) = E\left[ {\left( {f\left( x \right) - m\left( x \right)} \right)\left( {f\left( {x^{\prime } } \right) - m\left( {x^{\prime } } \right)} \right)^{\text{T}} } \right] $$
(19)

Generally, the mean function is set to zero. The squared exponential covariance function is usually used as the kernel function. The function is expressed as follows:

$$ k_{\text{f}} = \sigma_{\text{f}}^{2} \exp \left( { - \frac{1}{2}(x - x^{\prime}} \right)l^{ - 1} (x - x^{\prime})) $$
(20)

where the covariance \(\sigma_{f}^{2}\) represents the output amplitude of the diagonal matrix, l is the characteristic length scale. Thus, the prior distribution of the observations of Eq (17) can be expressed as

$$ \varvec{y}\sim N\left( {0, \, {\varvec{K}}_{\text{f}} (x,x) + \sigma_{n}^{2} \varvec{I}_{n} } \right) $$
(21)

where \({\varvec{y}}\) is the observation series \([y_{1} ,y_{2} \, ... \, y_{n} ]\), \({\varvec{I}}_{n}\) is an n-dimensional unit matrix. There are three hyper-parameters \({\varvec{\theta}} = \left[ {\sigma_{f} , \, \sigma_{n} , \, l} \right]\) in the covariance matrix which can be optimized by maximizing the log-likelihood function expressed by

$$ \begin{aligned} L &= \log p\left( {{\varvec{y}}\left| {{\varvec{x}},} \right.{\varvec{\theta}} } \right) = - \frac{1}{2}\log \left( {\det ({\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}_{n} } \right)) \hfill \\ & \quad -\, \frac{1}{2}{\varvec{y}}^{\text{T}} [{\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}]^{ - 1} {\varvec{y}} - \frac{1}{2}\log 2\uppi \hfill \\ \end{aligned} $$
(22)

The conjugate gradient method is widely used for finding the parametric optimal solution. The idea is to find the maximum value of the objective function by obtaining the derivative of the log-likelihood function [27].

$$ \left\{ \begin{gathered} \frac{\partial }{{\partial {\varvec{\theta}}_{i} }}{\text{log}}p\left( {{\varvec{y}}\left| {{\varvec{x}},} \right.{\varvec{\theta}} } \right) = \hfill \\ \, \frac{1}{2}tr\left\{ {\left[ {{\varvec{\alpha}} {\varvec{\alpha}}^{\text{T}} - ({\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}_{n} )^{ - 1} \frac{{\partial (K_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} I_{n} )}}{{\partial {\varvec{\theta}}_{i} }}} \right]} \right\} \hfill \\ {\varvec{\alpha}} = \left[ {{\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}_{n} } \right]^{ - 1} {\varvec{y}} \hfill \\ \end{gathered} \right. $$
(23)

The joint prior distribution of the observations \({\varvec{y}}\) along with the estimated value \({\varvec{y}}^{*}\) at a test point \({\varvec{x}}^{*}\) is expressed by

$$ \left[ \begin{gathered} {\varvec{y}} \hfill \\ {\varvec{y}}^{ * } \hfill \\ \end{gathered} \right]\sim N\left( {0, \, \left[ \begin{gathered} {\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}_{n} \, {\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}^{ * } ) \hfill \\ {\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}^{ * } )^{\text{T}} \, {\varvec{K}}_{\text{f}} ({\varvec{x}}^{ * } ,{\varvec{x}}^{ * } ) \hfill \\ \end{gathered} \right]} \right) $$
(24)

Then the estimation task can be carried out by the posterior distribution derived as

$$ p({\varvec{y}}^{*} \left| {{\varvec{x}},{\varvec{y}},{\varvec{x}}^{*} } \right.) = N({\varvec{y}}^{*} \left| {\overline{\varvec{y}}^{*} ,{\text{cov}} ({\varvec{y}}^{*} )} \right.) $$
(25)

where the estimated mean value \(\overline{\user2{y}}^{*}\) and the estimated covariance value \({\text{cov}} ({\varvec{y}}^{*} )\) are expressed as follows:

$$ \overline{\varvec{y}}^{*} = {\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}^{*} )^{\text{T}} \left[ {{\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}_{n} )} \right]^{ - 1} {\varvec{y}} $$
(26)
$$ \begin{aligned} {\text{cov}} ({\varvec{y}}^{*} ) &= {\varvec{K}}_{\text{f}} ({\varvec{x}}^{*} ,{\varvec{x}}^{*} ) - {\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}^{*} )^{\text{T}} \\ & \quad *\left[ {{\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}) + \sigma_{n}^{2} {\varvec{I}}_{n} )} \right]^{ - 1} {\varvec{K}}_{\text{f}} ({\varvec{x}},{\varvec{x}}^{*} ) \\ \end{aligned} $$
(27)

The estimation value and the uncertainty are represented by the mean \(\overline{\user2{y}}^{*}\) and covariance \({\text{cov}} ({\varvec{y}}^{*} )\), respectively.

3.3 Joint Estimation Design

The joint estimation system of inconsistency and SOH for battery packs is designed as shown in Fig. 4. Specifically, first in the data processing process, the data collection module collects the test data from the charging process, and the feature extraction module extracts the features from current change points. Then, the extracted features are input both for the inconsistency evaluation and feature selection. In the battery pack inconsistency evaluation process, the weights are allocated by AHP and MSE, respectively, and then the fusion weights are obtained by fusing these two weights. Next, the weights of all the features are combined with the battery cell inconsistency features to evaluate the battery pack inconsistency. The estimated value is used to evaluate the inconsistency condition and input as health indicators to the feature selection process. Then, the wrapper is used to select the optimal feature subset for model training. In the battery pack SOH estimation process, the hyper-parameters are optimized, and the regression model is trained by GPR. When the model is completely trained, the SOH can be estimated using the extracted features from the later cycles. Finally, an error evaluation module is used to assess the estimation performance. In this paper, four indicators including 95% confidence interval, relative error, maximum absolute error (MAE), and RMS error are used to assess the accuracy and reliability of the SOH estimation results. The 95% confidence interval and RMS error are calculated by Eqs (28) and (29), respectively.

$$ 95\% {\varvec{CI}} = \hat{y} \pm 1.96\,\sigma^{2} (\hat{y}) $$
(28)
$$ {RMSE} = \sqrt {\frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left( {\hat{y}_{i} - y_{i} } \right)^{2} } } $$
(29)

where \(95\% {\text{CI}}\) is the confidence interval; \(\hat{y}\) and \(\sigma^{2} (\hat{y})\) are the predicted value and variance, respectively; \(N\) is the size of the testing set; \(y_{i}\) is the standard value. Then, in the following section, the joint estimation results will be demonstrated and assessed in detail.

Fig. 4
figure 4

Battery pack joint inconsistency and SOH estimation system

4 Results and Discussions

In this section, the joint estimation results of battery pack inconsistency and SOH are given and evaluated. First, the inconsistency estimation results are demonstrated in Sect. 4.1, and SOH results are shown in Sect. 4.2. Then, a discussion part is followed in Sect. 4.3.

4.1 Inconsistency Evaluation Results

The battery pack inconsistency evaluation results based on fusion weights allocation are shown in Fig. 5. The initial battery pack inconsistency is set as 1. In Fig. 5a, the fusion weights-based battery pack inconsistency evaluation results are compared with the AHP-based and MSE-based evaluation results. It shows that inconsistency evaluations do not have obvious differences in the early stages of aging cycles, which indicates that all three methods could be applied for inconsistency evaluation during this period. However, with aging cycles increasing, the results of the AHP-based method and MSE-based method differ gradually. For instance, the time when the evaluation results of the AHP-based method exceeded 5 are 24 cycles earlier than that of the MSE-based method, which is about 3.5% of the total cycles. In this period, the fusion-based methods could give more comprehensive results based on these two methods which reflect both the subjective and objective evaluation of inconsistencies. Moreover, the fusion-based method shows smoother evaluation results. Figure 5b gives the fusion-based inconsistency evaluation results and the standard battery pack SOH varying with aging cycles. It shows that the inconsistency has an overall upward trend, while the SOH presents an overall downward trend. What’s more, both the inconsistency and the SOH show a trend of slow change in the early period and accelerated change in the later period. The Pearson correlation coefficient between the estimated inconsistency and the standard SOH is 98.29%. This suggests that the fusion-based battery pack inconsistency estimation method accurately tracks the aging conditions. Figure 5c shows the inconsistency value and SOH at some specific points in order to assess the effect of inconsistency on the aging process for battery packs. The estimated inconsistency enlarged from 1 to 1.41 during the first 100 cycles but it just increased by 0.35 from 100 to 300 cycles. Regarding the variation of SOH, it reduced from 100% to 96.54% during the first 100 cycles, while it just decreased by 1.69% from 100 to 300 cycles. The results show that inconsistency variation has an obvious impact on the aging process. And during later aging cycles, the inconsistency increment accelerated and the SOH decrement accelerated correspondingly. During the final 90 cycles, the inconsistency enlarged from 4.07 to 6.41, and the SOH rapidly reduced from 85.49% to 80.91%. This indicates that the accelerated enlargement of inconsistency seriously affects the health of battery packs. According to the inconsistency and the SOH, an optimal inconsistency degree grading is given in Table 3. In this rank method, the four stages range from 80% to 100% with 5% SOH intervals. Therefore, the inconsistency evaluation degree could also be used to represent the health status of the battery pack. The above results show that the inconsistency estimation method proposed in this paper, which extracts features from the current change points and allocate fusion weights based on AHP and MSE, could effectively evaluate the battery pack inconsistency during the entire aging cycles. Due to the simpleness of this feature extraction method, the computation cost of each inconsistency evaluation is just 0.14 s (calculated on MATLAB 2018b). Therefore, this method shows good potential for onboard utilization.

Fig. 5
figure 5

Battery pack inconsistency evaluation: a inconsistency estimation based on the AHP, MSE and fusion based method; b curves of the estimated inconsistency and standard SOH varies with cycles; c the estimated inconsistency and standard SOH at different cycle numbers

Table 3 Degree of battery pack inconsistency

4.2 SOH Estimation Results

In this subsection, the SOH estimation results based on wrapper feature selection and GPR prediction are evaluated. The data from early cycles are used for model training, and the remaining data are used for validation. In Fig. 6, the battery pack SOH estimation results are presented, where 50% of the test data is used for model training, and the rest is used for estimation. Figure 6a–c represent the results estimated by the initial feature set, the feature set selected by the filter method, and the feature set selected by the wrapper method, respectively. The filter method removes the features whose Pearson correlation coefficient with SOH is lower than 0.9. The initial feature set contains 16 features, the filtered feature set includes 11 features, and the feature set selected by the wrapper also involves 11 features. The features of each subset are listed in Table 4. It shows that although filter and wrapper sets both have 11 features, the specific features are quite different. The confidence intervals have no significant difference and all are narrow, which indicates the GPR models are all well trained for estimation, and the results are reliable. But, there are some notable comparisons in the relative error curves. The ranges of the axes of the three error curves are set consistently. It is obvious that the error range that is estimated by the feature set where the wrapper is used for feature selection, is narrower than the other two, which means the estimation results are more reliable. The estimation results in Fig. 6a and b get far away from the standard SOH near the end of life, while the results on Fig. 6c which is estimated by the joint estimation method, get closer to the standard value. This is important for the health management of battery packs, which could precisely report the health condition and guide the retirement. The MAE and RMS errors are listed in Table 5. It shows that these two errors obtained by the proposed method are smaller than those obtained by other methods, only 1.54 % for MAE and 0.70% for RMS error. The calculation time for each estimation cycle is 0.15 s. This proves that this method could realize the accurate battery pack SOH estimation in EVs/PHEVs. The estimation results obtained by the feature set without the evaluated inconsistency are evaluated. The results are shown in Fig. 6d, the MAE and RMS error of all the methods is listed in Table 5. The confidence interval and relative error in Fig. 6d are both larger than that in Fig. 6c obviously, the MAE and RMS error also get larger. This also proves that the comprehensively evaluated inconsistency degree is a credible HI for SOH estimation.

Fig. 6
figure 6

Battery pack SOH estimation and relative errors based on a initial features; b features selected by filter; c features selected by wrapper-based method; d feature set selected by wrapper (without evaluated values of inconsistency)

Table 4 Features in the selected by different methods
Table 5 Battery pack SOH Estimation MAE (%) and RMS error (%)

To further evaluate the estimation performance and the robustness of generalization ability, the training set with different sizes are studied. The results are demonstrated in Fig. 7. Figure7a and b show the estimation results and relative errors obtained in three feature subsets, where 70% data and 10% data are used for model training, respectively. It shows in Fig. 7a that when more data are used for more training, the results are more accurate for all the three methods. However, the results estimated by the initial feature set and filter selected feature set deviate gradually from the standard SOH with increasing cycles. Nevertheless, this can be avoided when the feature set is selected by the wrapper method, and the estimation is closer to the standard value. The accuracy increment of the filter-based method is lower than that of the other two, which indicates some important features with a low correlation coefficient have been removed. The MAE and RMS error of the proposed joint estimation method is only 0.72% and 0.25%, respectively. Then, 10% of data are used for model training to see the estimation performance with a little amount of data, the results are shown in Fig. 7b. It shows obvious differences, where the first two methods give weak estimations, while the proposed method could still track the standard SOH with satisfactory accuracy. The MAE and RMS errors of the first two methods are close to 10% and 4%, while that of the proposed method is still 2.58% and 0.93%. The confidence interval covers the standard SOH during the entire aging process, which signifies the estimations are reliable. The results suggest that when little data are used for training, the proposed method could remain accurate and reliable.

Fig. 7
figure 7

Battery pack SOH estimation and error results with a 70% b 10% data for model training using different feature

4.3 Discussion

The battery pack inconsistency and SOH estimations are obtained and evaluated. The joint method is first designed, achieving satisfactory results. In this subsection, we have a short discussion on the proposed method and its estimation performance. It shows that the proposed inconsistency evaluation method can reflect the aging conditions of battery and that the inconsistency is an important factor affecting the aging rate. Here, the inconsistency feature of battery cells can also detect which cell shows the greatest inconsistency and therefore can provide a reference for which cell needs to be replaced. Moreover, the features are extracted from the current change points in the charging process, and the SOC range is about 40%-80% that coincides with the range of every charging process. This enables the evaluation during each charging process, eliminating complicated calculations, and ignoring the start point. The inconsistency is graded according to its relationship with SOH and thus could also be used to estimate the health states. As for the battery pack SOH estimation, the evaluated inconsistency is input together with the extracted features. The wrapper can help select the optimal feature subset that guarantees the accuracy and reliability of the estimation performance. The results show that even only 10% of data is used for model training, MAE and RMS errors only are 2.58% and 0.93%, respectively. And, the wrapper selection method has great potential to improve the accuracy of SOH estimation by about 75.8% compared to the traditional filter method. The computational cost is also small with the joint estimation system only taking about 0.29 s for one cycle.

5 Conclusions

Battery pack inconsistency and state of health are two key characteristics that need to be accurately estimated in the battery management system. A novel joint estimation method of these two states is designed. With its ease of feature extraction, this method has the potential for onboard usages. From the inconsistency evaluations, it is verified the fusion weights-based inconsistency estimation method is advisable and the results prove that the battery packaging process is strongly related to the inconsistency. As for the SOH estimation, the wrapper feature selection method is verified to be able to improve both the accuracy and reliability of the estimation. Based on the proposed novel method, even only a small amount of data is used for training, satisfying results could still be obtained and give more accurate estimations near the end of life, which is significant for health management and reutilization. The computational cost of the joint estimation system is low, thus suitable for onboard utilizations. The satisfactory joint estimation results imply an interesting and promising research direction. This demonstrates potentials in applications by detecting a few partial charging profiles. However, there are still some limitations: the battery pack only consists of four cells, and the features may not suit other charging policies. In future work, this method will be applied to EV battery packs.