1 Introduction

The rapid development of LIBs has led to an increasing electrification of transportation systems, namely electric vehicles (EVs). Many countries have enacted policies to stimulate the development of EVs in order to reduce greenhouse gas emissions and save non-renewable energy [1].

The battery energy [2], charging [3] and thermal [4] management are important components of the battery managing system (BMS). In order to make sure that a LIB operates safely and reliably, the SOH is estimated to assess its working condition [5,6,7]. According to the standard of the IEEE \(1188-1996\), the SOH of a LIB can be defined as:

$$\begin{aligned} SOH = \frac{C_\mathrm{{now}}}{C_\mathrm{{new}}}\times {100\%}, \end{aligned}$$
(1)

where \(C_\mathrm{{now}}\) and \(C_\mathrm{{new}}\) are the current and the nominal capacity of the LIB, respectively.

To predict the SOH of a LIB, the internal resistance and capacity are often used as characteristics of aging [8,9,10], except when dealing with the solid phase diffusion time of lithium-ions in the positive electrode [11], or when monitoring aspects like the cyclable lithium-ions [12]. By using the capacity, DC pulse or electrochemical impedance spectroscopy (EIS) [13, 14] tests, among others, the LIB health parameters, such as voltage, current, temperature and charging time, are obtained and can be used directly in distinct methods to predict the SOH. For example, the capacity can be obtained by measuring the charge transferred through the battery during the charging or discharging phases [15], and the internal resistance can be determined by calculating the instantaneous voltage drop during a pulse test [16]. However, existing direct methods have limitations in practical applications. Indeed, they require very accurate sensors measurements, the battery must be put out of service for testing (e.g, in capacity or EIS tests), and specific methods are restricted to specific test systems (e.g., DC pulses with high currents are not allowed by the BMS, as they are seen as abnormal operating conditions).

In the last few years, several methods have been proposed and applied to SOH estimation, yielding real-time assessment of LIBs [17]. These approaches can be generally divided into model-based and data-driven methods. The most common model-based approaches rely on electrochemical models or equivalent circuit models (ECMs) [18,19,20,21,22]. Electrochemical models can accurately describe the LIBs dynamics, but the modeling process is complicated and the required computational cost is high, which difficult their use in practical applications. Conversely, ECMs are easier to obtain and involve smaller computational burden; however, their accuracy strongly depends on the values of the models’ parameters, which are difficult to estimate. Due to the complexity and limitations of model-based approaches, data-based techniques have gained interest in SOH estimation. Indeed, a variety of machine learning (ML) algorithms have been proposed, namely artificial neural networks (ANN) [23, 24], support vector machines (SVM) [25, 26] and relevance vector machines (RVM) [27, 28]. In terms of structure, the most recent versions rely on deep learning techniques [29]. Among all ML methods, ANNs have excellent accuracy, adaptability, generalization capability and robustness [30]. For example, a nonlinear autoregressive scheme with exogenous input was proposed in [31], while an extreme learning machine (ELM) was used in [32] for SOH estimation.

Choosing the input data is the first step to consider before training a ML method. Therefore, the dataset to be used as input to the BPNN is worth exploring. The battery data related to aging are either external or internal. The external includes temperature, charging and discharging rates, and depth of discharge, for example [17]. The internal refers mainly to physical and chemical properties of the LIB, such as generation of solid electrolyte interphase (SEI) layer, self-discharge, and decomposition of the anode [33]. In real-world applications, the BMS sensors can collect data, namely the terminals battery voltage, current, surface temperature and time of charge and discharge. Taking multiple data is beneficial to improve the accuracy of the SOH estimation. Current, voltage and temperature have been often used as input to ML algorithms [31, 34]. However, correlation analysis has revealed that the charging time of the LIB is even closer related to the SOH than those. Therefore, using the charging time as input to the BPNN is crucial for high accuracy estimation of SOH.

The BPNN is one of the most convenient types of ANN for the purpose of estimation due to its capabilities of nonlinear mapping, self-learning, adaptability, generalization and fault tolerance. Several methods have been applied to improve the performance of existing BPNNs [35,36,37,38], which have shown promising results. However, in previous research, the BPNN structure has not been fully explored, especially in what concerns the number of hidden layer units. Indeed, most BPNNs in the references above use a fixed number of hidden layer units or set the number of hidden layer units to a value in a given interval [39, 40]. A BPNN with a fixed number of hidden layer units may yield excellent results for a given training set, but behave poorly when the training data changes. For example, it was shown in [41] that for an optimal fractional order of 7/9, keeping the learning rate at 0.5, the prediction accuracy of a BPNN with 500 hidden layer units was lower than that obtained with 300 and 200 hidden layer units, and for learning rate at 1, the number of hidden layer units should be set equal to 500 to obtain the best prediction. Therefore, it is crucial to have a method that finds the optimal number of hidden layer units, so that the BPNN exhibits the best performance as the training data changes.

In this paper, a SOH prediction method based on a BPNN with adaptive hidden layer is proposed. This method takes the mean absolute error (MAE) of the prediction for a training dataset and chooses the number of hidden layer units that minimizes the MAE. Compared with other methods, the proposed scheme is capable of determining the optimal number of hidden layer units during the training phase, instead of fixing its value after experimentation, thus reducing the time required for setting up the network and improving the accuracy of SOH prediction. The main contributions of the paper are:

  1. 1)

    The proposed method determines the optimal BPNN structure adaptively during the training process, just relying on the results obtained for a training dataset. This is different from other neural network algorithms, which adjust the hyperparameters after each experiment;

  2. 2)

    The charging time of the LIB is employed, along with the voltage, current and temperature, as input to the BPNN;

  3. 3)

    The new BPNN is used with four distinct LIBs and different training datasets, and its prediction results are compared with those obtained with other two BPNNs. It is shown that the proposed scheme, by adaptively changing the structure of the network according to the dataset, leads to superior SOH prediction.

  4. 4)

    The proposed BPNN is shown to necessitate only 50% of data for training, yielding mean absolute percentage error (MAPE) inferior to 0.8%.

The remainder of the paper is organized as follows. Section 2 introduces the LIBs datasets and the parameters used for SOH estimation. Section 3 presents the new BPNN algorithm developed. Section 4 shows the experimental results obtained with the proposed BPNN. Section 5 draws the conclusions.

2 Datasets and features for SOH estimation

Four battery aging datasets from the NASA Ames Prognostic Center of Excellence (PoCE) repository are used in the follow-up [42]. The NASA batteries are of \(LiNi_{0.8}Co_{0.15}Al_{0.05}O_{2}\) and have a capacity of 2 Ah. Each LIB {B0005, B0006, B0007, B0018} has distinct aging trends, as shown in Fig. 1.

Fig. 1
figure 1

Aging trend of four NASA batteries

The datasets include charge and discharge parameters, namely voltage, current, temperature and time of cycle. Every cycle follows the constant current (CC)-constant voltage (CV) protocol for charging, and the CC protocol for discharging, as depicted in Fig. 2. The charge and discharge processes in one cycle have the following phases: (i) the current is set at 1.5 A until the voltage reaches 4.2 V; (ii) the charging current decreases while the voltage is kept at 4.2 V; (iii) the charging current drops until 20 mA, signaling that the charging process is over; (iv) a constant current of 2 A is set to discharge the LIB until the voltage droops down to the cut-off voltage.

Fig. 2
figure 2

Charge and discharge process in one cycle of the NASA LIBs

Figures 3a, 4a and 5a represent the current, voltage and temperature versus time of the battery B0005, while Figs. 3b, 4b and 5b depict the evolution of those parameters during the 50-th cycle of the four batteries {B0005, B0006, B0007, B0018}. From Figs. 3, 4 we verify that with the increase in the number of charging and discharging cycles, the voltage rises faster and achieves the CV charging step earlier for the same battery. Corresponding to this, the current leaves the CC step faster. On the other hand, in the same cycle of different batteries, although the changes of voltage and current are slightly different, their trends and trajectories are roughly similar. Figure 5 shows that the temperature of the LIB increases with the increase in the number of charging and discharging cycles. The speed at which the peak is reached rises, and the average temperature increases. Moreover, the temperature variation of different batteries is similar, with the exception of the B0018 (\(18\#\)), which has a sudden change in temperature at the beginning of the battery charging and discharging phases.

Fig. 3
figure 3

Voltage variation in different circumstances

Fig. 4
figure 4

Current variation in different circumstances

Fig. 5
figure 5

Temperature variation in different circumstances

Fig. 6
figure 6

Correlation analysis results for health parameters using GRA

Based on the above discussion, one can see that the charging time has a relationship with the aging state of the LIB. To explore this relationship closer, the grey relational analysis (GRA) method is applied here. This method is able to calculate the correlation between parameters with very little information [43]. The GRA analysis results are shown in Fig. 6. We verify that the charging time of the LIB is more closely related to the SOH than the voltage, current and temperature. Therefore, in the follow-up, the charging time is adopted as input to the BPNN, while the LIB capacity related to the charge–discharge cycle is selected as the output.

3 Methods

3.1 The BPNN

The structure of the BPNN is shown in Fig. 7. Let us consider that the training dataset is given by \(D=\left\{ (x_{1}, y_{1}), (x_{2}, y_{2})...(x_{i}, y_{i}),...(x_{n}, y_{n}) \right\} , x\in {\mathbb {R}} ^{a}, y\in {\mathbb {R}} ^{b}\), where n is number of samples in the dataset, \(x_{i}\) and \(y_{i}\) are the input and output of the i-th sample, \(i=1, 2,..., n\), and a and b represent the dimensions of the input and output matrices, respectively.

Fig. 7
figure 7

The structure of the BPNN

We normalize all parameters by:

$$\begin{aligned} z_{i} ^{'} =\frac{z_{i}-z_\mathrm{{min}} }{z_\mathrm{{max}}- z_\mathrm{{min}}}, \end{aligned}$$
(2)

where \(z_{i} ^{'}\) and \(z_{i}\) represent the normalized and non-normalized values, and \(z_\mathrm{{max}}\) and \(z_\mathrm{{min}}\) denote the maximum and minimum values of the data, respectively.

The output of the hidden layer is:

$$\begin{aligned} h_{j} =g\left( \sum _{i=1}^{a}w_{ij} -d_{j} \right) , j=1,2,...,c, \end{aligned}$$
(3)

where g denotes the activation function of the hidden layer, \(h_{j}\) is the j-th output node of the hidden layer, \(w_{ij}\) represents the weight connecting the input and hidden layers, and \(d_{j}\) is the threshold of the j-th hidden node.

The output of the output layer is given by:

$$\begin{aligned} {\widehat{y}}_{k} =f\left( \sum _{j=1}^{c}h_{j}w_{jk}-d_{k} \right) , k=1,2,...,b, \end{aligned}$$
(4)

where \({\widehat{y}}_{k}\) represents the prediction of the k-th node, f is the activation function of the output layer, \(w_{jk}\) is the weight connecting the hidden and output layers, and \(d_{k}\) is the threshold of the k-th output node.

According to the model of the BPNN, the mean square error (MSE) of the sample \(\left( x_{l},y_{l} \right)\) is calculated from the actual and predicted values, which is:

$$\begin{aligned} E_{l}=\frac{1}{2}\sum _{m=1}^{l}\left( {\hat{y}}_{m}^{l}-y_{m}^{l} \right) ^{2}. \end{aligned}$$
(5)

The BPNN is an iterative learning algorithm, where the updated estimator for any parameter v is given by:

$$\begin{aligned} v'=v+\bigtriangleup v. \end{aligned}$$
(6)

After the training target error is given, the BPNN uses the gradient descent method to adjust the weights of the network. Given the learning rate \(\eta\), the update formula for the weights is:

$$\begin{aligned} w_{ij}'= & {} w_{ij}-\eta \frac{\partial E_{l} }{\partial w_{ij} }, \end{aligned}$$
(7)
$$\begin{aligned} w_{jk}'= & {} w_{jk}-\eta \frac{\partial E_{l} }{\partial w_{jk} }, \end{aligned}$$
(8)

where \(w_{ij}'\) and \(w_{jk}'\) are weights after iterating.

The purpose of the BPNN is to minimize the accumulation error on the training set D, which is expressed as:

$$\begin{aligned} E=\frac{1}{n}\sum _{l=1}^{n}E_{l}. \end{aligned}$$
(9)

3.2 The BPNN with adaptive hidden layer

In most BPNN, the number of hidden layer nodes is fixed [39, 44]. Although for a given number of nodes and training dataset, the network can perform well, its performance may degradate seriously as the training dataset changes. Therefore, the BPNN model needs to be re-designed. Indeed, when the network has too many hidden layer nodes, either over-fitting or long training times may result, depending whether or not the training data carries insufficient or enough information, respectively. Conversely, when the network has too few neurons in the hidden layer, under-fitting will result.

Fig. 8
figure 8

The process of the proposed SOH estimation method

The number of hidden layer nodes is chosen by means of the empirical formula:

$$\begin{aligned} h_\mathrm{{max}}< \sqrt{r+s}+\varrho , \end{aligned}$$
(10)

where \(h_\mathrm{{max}}\) is the maximum number of hidden layer nodes, r and s denote the numbers of input and output nodes, respectively, and \(\varrho\) is a constant lesser than 10 [45].

Herein, the mean absolute error (MAE) is used to quantitatively evaluate the accuracy of the training set, that is,

$$\begin{aligned} \mathrm{{MAE}}=\frac{1}{n}\sum _{i=1}^{n} \left|y_{i}-\hat{y_{i}} \right|, \end{aligned}$$
(11)

where \(y_{i}\) and \(\hat{y_{i}}\) denote the real and predicted values of the dataset. Usually, there exists a positive correlation between the accuracy of the training set and the test set [46], meaning that high (low) accuracy of the training set often leads to high (low) accuracy of the test set. Therefore, we choose the number of hidden layer nodes of the BPNN in the interval \([1, h_\mathrm{{max}}]\) and compute the training set error. We denote the error of the last calculation as \(\mathrm{{MAE}}_\mathrm{{new}}\), the corresponding number of hidden layer nodes as \(h_\mathrm{{new}}\), the minimum error in all previous calculations as \(\mathrm{{MAE}}_\mathrm{{previous}}\), and the corresponding number of hidden layer nodes as \(h_\mathrm{{previous}}\). The proposed optimal number of hidden layer nodes is given by:

$$\begin{aligned} h_\mathrm{{best}} =\left\{ \begin{array}{lll} &{} h_\mathrm{{previous}},&{}\mathrm{{MAE}}_\mathrm{{new}}> \mathrm{{MAE}}_\mathrm{{previous}}, \\ &{} h_\mathrm{{new}},&{}\mathrm{{MAE}}_\mathrm{{new}}< \mathrm{{MAE}}_\mathrm{{previous}}. \end{array}\right. \end{aligned}$$
(12)

The proposed BPNN with adaptive number of hidden layer nodes is depicted in Fig. 8. The algorithm is given as follows:

  1. 1.

    Initialization. Set the parameters, namely the numbers of nodes of the input and output layers, the training epochs and the learning rate;

  2. 2.

    Calculate the maximum number of hidden layer nodes \(h_\mathrm{{max}}\) by means of the empirical formula (10), and set the number of hidden layer nodes between 1 and \(h_\mathrm{{max}}\);

  3. 3.

    Train the BPNN with different number of hidden layer nodes and calculate the MAE corresponding to each number of hidden layer nodes;

  4. 4.

    Chose the number of hidden layer nodes that corresponds to the minimum MAE as the best hidden layer, \(h_{best}\);

  5. 5.

    Use training samples to train the BPNN with the optimal number of hidden layer nodes, thus creating the best-fitting network;

  6. 6.

    Use the network created to forecast the test sample.

4 Results

In this Section, the effectiveness and generalization capability of the proposed BPNN are analyzed and discussed.

Fig. 9
figure 9

Results using 70% of the data as the training dataset: a B0005, b, B0006, c, B0007, d B0018

Fig. 10
figure 10

The RMSE of four batteries trained with 70% of the data

Fig. 11
figure 11

The MAPE of four batteries trained with 70% of the data

Fig. 12
figure 12

Results using 60% of the data as the training dataset: a B0005, b B0006, c B0007, d B0018

Fig. 13
figure 13

The RMSE of four batteries trained with 60% of the data

Fig. 14
figure 14

The MAPE of four batteries trained with 60% of the data

Fig. 15
figure 15

Results using 50% of the data as the training dataset: a B0005, b B0006, c B0007, d B0018

Fig. 16
figure 16

The RMSE of four batteries trained with 50% of the data

Fig. 17
figure 17

The MAPE of four batteries trained with 50% of the data

The health parameters mentioned in Sect. 2 (or some subset) are used as input to the BPNNs tested, and the capacity of the batteries is taken as the output. The proposed adaptive (AD) hidden layer four-dimensional (4D) network (AD4DBPNN), with voltage, current, temperature and charging time as input parameters is compared with an AD hidden layer three-dimensional (3D) network (AD3DBPNN), with voltage, current and temperature as inputs, and with a fixed hidden layer 4D network (4DBPNN) having the same inputs as the AD4DBPNN.

It should be noted that the method proposed in this paper aims to adaptively change the number of hidden layer units in order to obtain the optimal neural network structure every time the dataset changes. Indeed, the optimal network structure for two distinct training datasets, let us say A and B, is not necessarily the same. Therefore, if the network structure is set based on the training dataset A, then this network may not be suitable for using with dataset B. Thus, the network may be suboptimal and poor prediction results may be obtained.

In the follow-up, we use the front part of a single dataset for training and the latter part for testing. The number of hidden layer nodes of the 4DBPNN is set equal to the optimal number of hidden layer nodes determined for the AD4DBPNN when trained using 70% of the data for each battery and, then, it is kept fixed.

The AD4DBPNN, AD3DBPNN and 4DBPNN are implemented using the software MATLAB 2021. With the exception of the number of hidden layer nodes and the dimension of the input, all BPNNs use the same structure and features:

  • Maximum training epochs equal to 1000;

  • Learning rate set to 0.001;

  • Goal error of training (MAE) equal to 0.0001;

  • Range of the connection weights and thresholds in [− 1,1].

The root mean squared error (RMSE) [47] and MAPE [48] are used as evaluation functions:

$$\begin{aligned} \mathrm{{RMSE}}= & {} \sqrt{\frac{1}{N}\sum _{i=1}^{N}({\hat{y}}_i-y_i)^{2} }, \end{aligned}$$
(13)
$$\begin{aligned} \mathrm{{MAPE}}= & {} \frac{100\%}{n} \sum _{i=1}^n \left|\frac{{\hat{y}}_i-y_i}{y_i} \right|, \end{aligned}$$
(14)

where \({\hat{y}}_{i}\) and \({y}_{i}\) represent the estimated and true values of the SOH, respectively.

First, 70% of the data of four batteries are used for training, and the remaining 30% are used for testing the effectiveness of the BPNNs. Figure 9 depicts the training results of the AD4DBPNN, AD3DBPNN and 4DBPNN, from where we verify that they are relatively close to the real value. When including the charging time as input (AD4DBPNN and 4DBPNN), the networks yield smaller errors than the AD3DBPNN. Figures 10 and 11 show the RMSEs and MAPEs, respectively. One can see that after adding the charging time as an input, the RMSE of the battery B0005 decreases from 1.87% to 0.41%. For the other three LIBs, the AD4DBPNN still has the lowest RMSE or values close to those of the 4DBPNN. Moreover, with the charging time as input, the MAPE decreased by 0.4%, 0.52% and 0.35% in the batteries B0006, B0007 and B0018, respectively.

Then, we reduce the proportion of the training set from 70% to 60% of the data. Figure 12c and d highlights that the predictions of the AD4DBPNN have higher accuracy than those of the AD3DBPNN. This means that without adding the charging time as input to the neural network the error of the SOH prediction becomes larger. From Fig. 12a–d, one can see that the prediction results of the AD4DBPNN are closer to the true value than those of the 4DBPNN, whose hidden layer keeps unchanged. As shown in Fig. 13, the RMSE of the predicted results of the AD4DBPNN is 30.3% lower than that of the 4DBPNN. As shown in Fig. 14, for the batteries B0006, B0007 and B0018, the MAPE of the AD4DBPNN decreased by 36.3%, 23.5% and 23.4%, respectively, compared with the values of the 4DBPNN. This shows that, under the same training dataset, decreasing the proportion of training data, keeping the number of hidden layer units of the BPNN constant, leads to local optimum. The behavior means that the training set (the first 60%) and the testing set (the last 40%) have small and large errors, respectively, which can be seen in Fig. 12 where the capacity is reduced from 1.5 to 1.4.

Finally, the training set was further reduced to 50% of the data. Figure 15 shows the performance of the three BPNNs. It can be seen that the AD4DBPNN exhibits the better generalization and accuracy, while the results of the AD3DBPNN and 4DBPNN show different degrees of fluctuation. Figure 16 compares the RMSE of the predicted results. We verify that the AD4DBPNN still maintains good accuracy, namely 1.45% for the worst performing battery (B0006). In contrast, both the AD3DBPNN and 4DBPNN have RMSEs exceeding 1.9%, showing poor predictive capability. Figure 17 shows that the proposed method achieves 0.65% MAPE on the worst performing dataset, while the other two algorithms achieve 0.95% and 1.18%, respectively. Table 1 summarizes the best number of hidden layer nodes of the BPNNs when the training set changes. It can be seen that the number of hidden layer nodes for obtaining the best results is different for different training sets.

Table 1 Adaptive hidden layer BPNN process

In general, the AD4DBPNN is more capable of predicting the SOH of LIBs when the training dataset changes. When the charging time is used as input for training, the prediction accuracy is effectively improved. In addition, when the proportion of training data drops to 50% of the data, the maximum error of the AD4DBPNN is only 0.65%, which proves its effectiveness. From Figs. 9, 10, 11, 12,13, 14, 15, 16, 17, the AD4DBPNN still maintains high accuracy when the battery dataset used to train the neural network changes. When the dataset changes from B0007 to B0018, the total number of samples in the dataset decreases from 168 to 132. At this time, the AD4DBPNN can still show very high accuracy in predicting the SOH. The above shows that the AD4DBPNN has good generalization capability.

5 Conclusion

This paper proposed a three-layer, four-dimensional, adaptive hidden layer network (AD4DBPNN), with input parameters voltage, current, temperature and charging time, to predict the SOH of LIBs. The method determines the optimal number of hidden layer units based on the MAE of the prediction for a training dataset. The accuracy of the SOH prediction is improved by including the charging time as input to the BPNN, since it is highly correlated with the SOH of the LIB. Even when the ratio of the training data is reduced to 50% of the full dataset, the proposed method shows good accuracy, with only 1.45% and 0.65% RMSE and MAPE, respectively, on the worst performing dataset. Although the proposed method uses an adaptive scheme to determine the optimal structure of the BPNN, it does not avoid the gradient descent method. Future work will be carried out in the direction of using a suitable algorithm to optimize the iteration method of the network weight threshold.