Data-Based Interpretable Modeling for Property Forecasting and Sensitivity Analysis of Li-ion Battery Electrode

Lithium-ion batteries have become one of the most promising technologies for speeding up clean automotive applications, where electrode plays a pivotal role in determining battery performance. Due to the strongly-coupled and highly complex processes to produce battery electrode, it is imperative to develop an effective solution that can predict the properties of battery electrode and perform reliable sensitivity analysis on the key features and parameters during the production process. This paper proposes a novel tree boosting model-based framework to analyze and predict how the battery electrode properties vary with respect to parameters during the early production stage. Three data-based interpretable models including AdaBoost, LPBoost, and TotalBoost are presented and compared. Four key parameters including three slurry feature variables and one coating process parameter are analyzed to quantify their effects on both mass loading and porosity of battery electrode. The results demonstrate that the proposed tree model-based framework is capable of providing efficient quantitative analysis on the importance and correlation of the related parameters and producing satisfying early-stage prediction of battery electrode properties. These can benefit a deep understanding of battery electrodes and facilitate to optimizing battery electrode design for automotive applications.


Introduction
Global challenges including dramatic climate change and depleting fossil fuel reverses have spurred the acceleration of sustainable transportation technologies. Due to the advantages in terms of high energy density and low selfdischarging rates, lithium-ion (Li-ion) batteries have become one of the most promising energy storage devices in automotive applications such as electric vehicles (EVs) and hybrid electric vehicles (HEVs) [1]. However, the performance of Li-ion batteries such as capacity, service life, and energy and power densities are heavily dependent on their electrode properties, which are largely determined by the related production stages. To optimize battery performance, it is vital to understand the correlation between production parameters and battery electrode property variables [2].
Unfortunately, battery electrodes are produced under many intermediate substages with numerous 1 3 strongly-coupled variables or parameters [3]. As multi-disciplinary knowledge including electrical, mechanical, and chemical information is often involved in producing battery electrode, the analysis on the importance and correlation between the intermediate process parameters and the battery electrode variables is challenging and usually reliant on engineering expertise and the trial-and-error method [2,4]. These conventional methods are extremely laborious and time-consuming, and even lead to deteriorated quality control and difficulty in product rating at earlier stages [3]. In this context, it is imperative to devise an effective databased strategy to perform reliable sensitivity analysis on the aforementioned aspects.
With the rapid development of machine learning and intelligent computing techniques, data-based modeling solutions have become one of many popular tools for the final battery production management [5,6]. A great deal of research has been conducted for internal battery states estimation [7,8], battery aging prognostics [9,10], battery fault diagnostics [11], cell equalization management [12], charging control [13,14], and energy management [15,16]. Usually, effective battery management can be obtained based on suitable data-based models development [17]. However, these works mainly focus on the macro performance improvement of batteries while little has been done on enhancing their related production process considering microscopic properties of battery electrodes [18]. As battery electrodes play a vital role in determining the final performance of battery cells, it is also worthwhile to design effective data-based models for efficient analysis and prediction of battery electrode properties.
In comparison with the in-situ operation management of batteries with fruitful data-based modeling strategies, fewer existing works have been done on deriving data-based models to benefit battery production management. Among limited research regarding the monitoring [19] and adjustment [20] of battery production, a suitable data-based model to analyze and forecast battery intermediate properties is especially crucial. For example, on the basis of cross-industry standard process (CRISP), the data-based models using both linear regression and neural network models are proposed in Ref. [21] to predict battery properties and to identify the dependency of its production chain. By using a recursive feature elimination method, a data-based model is derived in Ref. [22] to identify improvement potentials and to analyze the relevant elements within the drying stage of battery production. After designing a statistical data-based solution to analyze the fluctuations of battery production, their effects on battery capacity are evaluated in Ref. [23]. The datadriven methodologies from the most existing research are mainly utilized to perform property prediction during battery production, while little has been done on designing databased solutions, especially using interpretable framework, for in-depth analysis of the effects of process parameters on battery properties at key production stages [24,25]. It should be known that the parameters from early production stages such as mixing and coating are crucial to determine the properties of battery electrodes [26]. To achieve smarter battery production management and optimize battery electrode properties, it is vital to perform efficient sensitivity analyses of battery electrode properties with respect to its mixing and coating specifications.
Given the aforementioned consideration, a boosting-treebased interpretable machine learning framework is proposed for the sensitivity analysis of key production process parameters and the prediction of battery electrode porosity qualities. This paper focuses on the effects of mixing and coating key parameters on final battery electrode properties. The key works done in this paper are summarized as follows: (1) to quantify the importance and correlation of four key parameters from both mixing and coating via a well-designed interpretable machine learning framework; (2) to classify and predict battery electrode porosity at early production stages via effective data-based models; (3) to evaluate and compare performance of typical AdaBoost model and two other improved tree boosting-based models (LPBoost and TotalBoost) for battery electrode property classification. All these developments can help battery engineers to effectively produce high-performance batteries, further benefitting smart battery production management for automotive applications.

Battery Electrode Properties and Data Curation
In this section, the fundamentals of battery electrode production and two key electrode properties are introduced, followed by the description of battery electrode data curation.

Fundamentals of Battery Electrode Production
As a key element of Li-ion batteries, electrode generally consists of anode and cathode types. To produce high-quality battery electrodes, a highly complicated production process is generally required [27]. As shown in Fig. 1, after preparing electrode component formulation such as the electrode additive, active material, and polymeric binder, a mixing process is deployed to mix these components based on the formulation within a mixer such as soft blender to generate homogenous slurries. Then, a coating process is performed to coat slurries onto the surface of the collector foils. In general, copper foil is used for anode electrodes, while aluminum foil is often adopted for cathode electrodes. Then the intermediate product from coating stage would be dried by an oven with a predefined temperature and calendered by applying mechanical pressure through two cylindrical rolls. Finally, a cutting process is performed to cut electrodes into proper shapes for coin or pouch battery cells. It should be noted that the whole production process of battery electrodes involves electrical, mechanical, and chemical operations. Specific equipment are usually required at all intermediate production stages [26]. As two key processes to produce battery electrodes, mixing and coating are complicated with many strongly-coupled parameters, and these process parameters directly determine the properties of battery electrodes, further affecting the relevant battery performance. In this context, a reliable solution is desired to forecast properties of battery electrode as well as analyze the sensitivity of related process parameters on the electrode properties at the early production stage and improve battery product performance. To achieve this, a tree boosting model-based interpretable machine learning framework is designed to forecast two properties of battery electrode, while both importance as well as correlations of some key process parameters are also analyzed and quantified. Specifically, two electrode property Key Performance Indictors (KPIs) including the electrode mass loading with unit of mg∕cm 2 and porosity with unit of % are adopted. Mass loading is directly related to the final battery capacity while porosity is crucial for dealing with highly porous electrodes. The effects of three mixing parameters including the mass content (MC) of active materials, solid-to-liquid ratio (StLR), and viscosity (Vis), as well as one coating parameter namely the comma gap (CG) on these two electrode properties are investigated. Here StLR represents the mass ratio between slurry solid and mass, while Vis affects the shear rate of coating stage, and CG is the gap between coating comma and coating roll. As the electrode mass loading is directly related to the final battery capacity and measured by a high-precision scale for weight while electrode porosity would deal with highly porous electrodes and be affected by other properties such as coating thickness and coating density, both of these two properties are worthy of exploration.

Battery Electrode Data Curation
Data curation is a key step for modeling activities. Without the loss of generality, the battery electrode experimental dataset from Franco Laboratoire-de-Reactivite-et-Chimiedes-Solides is adopted. Detailed experimental information and data explanation can be referred to Ref. [28]. The original data set has a size of 656 samples in total. For each sample, three slurry parameters after mixing and one coating parameter are used to generate one related mass loading and porosity observation of battery electrode. To fully investigate the prediction/classification performance of designed tree boosting-based models, both battery electrode mass loading and porosity are labeled with five classes (very low, low, medium, high, and very high). Specifically, for the electrode mass loading, very low and low represent the ranges of (0, 15] and (15,25], respectively. Medium stands for the range of (25,35], while high and very high refer to the ranges of (35, 45] and (45, 60], respectively. For the electrode porosity, very low refers to the range of (0, 47.5], low refers to the range of (47.5, 50], medium reflects the range of (50, 52.5], while high and very high refer to the ranges of (52.5, 55], and (55, 70], respectively. After setting battery electrode mass loading and porosity with predefined class labels, the tree boosting model-based interpretable machine learning framework for prediction/classification of both electrode properties as well as sensitivity analyses of process parameters can be designed.

Ensemble Learning Technologies
In this section, the fundamentals of AdaBoost are first introduced, followed by the descriptions of another two improved boosting techniques including LPBoost and TotalBoost. Then the framework of using tree boosting-based model to predict battery electrode properties and analyze related process parameters of interest is designed. Some indicators are also given to evaluate the performance of established models.

AdaBoost, LPBoost, and TotalBoost
Boosting is one of the most popular solutions to derive ensembled tree-based models. The key idea of boosting is to sequentially train different weak hypothesis, while the training dataset's distribution would be also changed dynamically based on the performance of previously trained weak hypothesis.  Fig. 1 Typical processes of battery electrode production Adaptive boosting (AdaBoost) is a typical and effective boosting solution for prediction applications [29]. Let training dataset TD includes M observations a s : TD = x 1 , y 1 , x 2 , y 2 , … , x M , y M . H e r e x m (m = 1, 2, … , M) reflects the input vector of corresponding battery process parameters, y m (m = 1, 2, … , M) stands for the preset classification labels with a total number of C , L(x) stands for a weak hypothesis that would provide an output result related to x , the detailed process to establish AdaBoost-based classification model is summarized in Workflow 1.
Based upon AdaBoost, another two effective boostingbased strategies including the linear programming boosting (LPBoost) and total boosting (TotalBoost) are also explored. It should be noted that both LPBoost and TotalBoost have the similar establishment workflow as AdaBoost, but these two improved solutions would be self-terminating and produce ensembles with small weights. More information regarding the process to establish LPBoost-based model and TotalBoost-based model are described in Workflow 2 and Workflow 3, respectively. To be specific, LPBoost adopts the weighted linear combination of weak hypothesis, so that a weak hypothesis can be added in each iteration with the adjustment of previous weak hypothesis' weights [30]. TotalBoost realizes the classification through maximizing the minimal margin [31]. More details of these two boosting solutions can be found in Ref. [32].

Framework of Designing Tree Boosting-Based Model for Analyzing Battery Electrode Properties
To well predict the properties of battery electrode and effectively analyze the sensitivity of mixing as well as coating related parameters, a novel interpretable machine learning framework with a tree boosting-based model structure as shown in Fig. 2 is designed. To be specific, three mixing parameters (MC, StLR, and Vis) and one coating parameter (CG) are utilized as the inputs to the model, while the relevant manufactured battery electrode properties including electrode mass loading and porosity are used as outputs. As illustrated in Fig. 3, the detailed framework of using tree boosting-based technique to carry out the sensitivity analysis and forecast/classify the qualities of relevant electrode mass loading as well as porosity can be summarized with four key parts as follows: Part 1. Data preprocess The raw data related to battery electrode production will be first preprocessed to remove outliers and add the missing values. Through setting the thresholds of manufactured electrode mass loading and porosity as [5 mg∕cm 2 , 60 mg∕cm 2 ] and [30%, 70%], respectively, the outliers outside these thresholds would be removed, further providing a robust result under uncertainties during battery production. Afterward, the output observations will be set with the relevant classification labels. According to the predefined rules in Sect. 2.2, five labels including very low, low, medium, high, and very high are set to reflect the qualities of both battery electrode mass loading and porosity.
Part 2. Tree boosting-based model construction To establish effective tree boosting-based model, the hyperparameters of AdaBoost, LPBoost, and TotalBoost-based models need to be determined. It should be known that for all these three boosting solutions, decision tree is usually selected as their weak hypothesis. Two main hyper-parameters require to be preset: the number of ensembled decision tree ( N ) and their learning rates ( r ). In theory, large N leads to the improved prediction accuracy, but too many decision  trees will also cause overfit and increase the computational effort. In order to determine a suitable N, an iteration strategy through comparing learner weights via various numbers of utilized weak hypothesis is adopted. For learning rate r , it reflects a decay rate of each hypothesis's weight, further affecting the performance of each decision tree. As suggested by Ref. [33], r could be set as 0.1 for the general prediction/classification applications. After these two hyperparameters are set, all these tree boosting-based models can be well-trained with the process as illustrated in their related workflow. Part 3. Importance and correlation analyses To quantify how important the battery production process parameters would affect electrode properties, Gini index that represents the variation of impurity due to the splits of each parameter is utilized. For the tree-based classification, in theory, impurity could stand for how good a potential split is for decision tree's nodes [33]. The larger Gini index value a process parameter can obtain, the greater effect that this parameter could impose on battery electrode. Besides, to carry out the correlation analysis of each battery production parameter pair, predictive-measure-of-association (PMOA) is utilized. Supposing two parameters of interest are p i and p j , the PMOA value to reflect their correlation is calculated by: where l and r represent the left child and right child of nodes; OP l and OP r mean the observation proportions of p i < y and p i ≥ y , respectively; OPl i,j is the observation proportion under the condition of p i < y and p j < z , while OPr i,j reflects the observation proportion under the condition of p i ≥ y and p j ≥ z . The process of obtaining PMOA is to investigate all the potential splits with the best case that is obtained during decision tree's training stage. In this context, PMOA has the ability to quantify the similarity between different rules for splitting observation.
Part 4. Results visualization After obtaining the Gini index for each parameter of interest, the relevant feature importance ranking can be generated. After using Eq. (1), the PMOA values of all parameter pairs can be obtained and shown as a 4 × 4 heat map. Both of these visualization results could give information for users to directly understand the importance and correlations of parameters that could affect battery electrode properties.

Performance Indicators
To further quantify and investigate the prediction/classification performance, confusion matrix (CM) is adopted as a key performance indicator. Supposing positive stands for an interested class while negative relates to other classes, four min OP l , OP r basic elements including the true positive ( TP ), true negative ( TN ), false positive ( FP ), and false negative ( FN ) could be obtained. Afterwards, the positive predictive value PPV C i as well as false discovery rate FDR C i of interested class can be obtained: Then a popular performance indicator for the accuracy of classification results namely micro F1 score ( microF1 ) can be calculated as: where TP all and TN all represent all correct classifications, and N total means the total amount of observations. Besides, the receiver operating characteristic (ROC) curve as well as its area under curve (AUC) value are adopted to explore the results of electrode properties prediction/classification. It should be known that ROC curve is a statistical plot to reflect the diagnostic ability of a classification model under the case of varying its discrimination threshold [34]. The AUC could give the degree or measure of separability of classes.

Results and Discussions
In this section, on completion of all steps within the parts from Sect. 3.2, the forecast and sensitivity analysis tests by designing proper tree boosting-based models are carried out to quantify both importance and correlations of four input battery production parameters of interest, while the battery electrode mass loading as well as porosity qualities will be also classified. Without the loss of generality, all designed tree boosting-based models (AdaBoost, Total-Boost, and LPBoost) are evaluated through using the fivefold cross-validation. It should be known that for different folds of cross-validation, these quantified importance and correlation values would present the same trends without large difference.

Model Training
The first case study focuses on the prediction of battery electrode mass loading and the relevant sensitivity analysis of four process parameters. To evaluate if the training process of all three tree boosting-based models could converge as well as to avoid overfitting of their training process, a test of using tree stumps with only 1 maximum split as weak hypothesis is carried out. Figure 4 illustrates the related training error via the increased number of tree stumps. It is evident that the training errors of all these three tree boosting-based models could converge to 0 after using 23 tree stumps, indicating that reasonable convergence results can be achieved for the battery electrode mass loading case. Then, the number of ensembled decision tree ( N ) for AdaBoost is set as a default value 50. As the weights of both LPBoost and TotalBoost cases are decreased via the number of ensembled weak hypothesis, to determine N of them, their hypothesis weights after compacting the corresponding weak hypothesis (decision tree) are illustrated in Fig. 5. Obviously, both LPBoost and TotalBoost-based models present clear decrease trajectories as the number of ensembled decision trees increases. Here, the weights of LPBoost and TotalBoost become negligible after using 19 and 14 decision trees, respectively, indicating that a satisfactory convergence of model training can be achieved. In this context, the N of LPBoost model and TotalBoost model are set as 19 and 14 for the case study of battery electrode mass loading, respectively.

Parameter Sensitivity Analysis
After setting the hyper-parameters of all tree boosting-based models, the sensitivity analyses of how the process parameters of interest affect battery electrode mass loading can be conducted. After calculating the Gini index values of MC, StLR, CG, and Vis, their importance ranking is quantified, as illustrated in Fig. 6. It is shown that CG and Vis achieve the highest and lowest importance ranking among four interested parameters, indicating that CG and Vis play the most and least effects on determining the mass loading property of electrode, respectively. This conclusion is reasonable as the coating weight as well as thickness that directly determine battery electrode mass loading are significantly affected by CG in theory.
To quantify the correlations of each process parameter pair for electrode mass loading case, the PMOAs of all pairs derived from four parameters (MC, StLR, CG, and Vis) are calculated and visualized with a heat map matrix, as shown in Fig. 7. Quantitatively, the largest value of PMOA is achieved for the pair of MC and StLR, but this value is just

Classification Performance Evaluation
After well establishing the tree boosting-based models, the microF1 values of battery electrode mass loading prediction for AdaBoost model, LPBoost model, as well as TotalBoost model cases are illustrated in Table 1. The corresponding confusion matrix and ROC curves are also shown in Figs. 8 and 9, respectively. It is shown that battery electrode mass loading can be well predicted through using these four parameters at the early prediction stage as the microF1 values for all models are larger than 89%. Quantitatively, TotalBoost-based model presents the best classification result with a microF1 value of 92.7%, which is 2.8% and 3.3% larger than that of LPBoost and AdaBoost cases, respectively. From Fig. 9, the AUC value of TotalBoost case is the highest one with 0.97, which is 4.3% and 3.2% larger than that of AdaBoost and LPBoost cases, respectively. In light of this, the designed TotalBoost model presents the most competent performance for classification/prediction of battery electrode mass loading case.

Model Training
The second case study focuses on the Li-ion battery electrode porosity. As illustrated in Fig. 10, the training errors of both TotalBoost and LPBoost cases converge to 0 after the number of tree stumps becomes larger than 33, while the error of AdaBoost case could also converge to 0.06 after using 42 tree stumps. Therefore, all these three tree boosting-based models are capable of achieving reliable convergence results for battery electrode porosity case. Without the loss of generality, the hyper-parameter N of AdaBoost is also set as 50. To further determine N of LPBoost and TotalBoost cases, their learner weights after compacting weak hypothesis (decision tree) are shown in Fig. 11. It is evident that the weight of LPBoost and TotalBoost would become negative after 32 and 12 decision trees are utilized, respectively. In light of this, N of LPBoost and TotalBoost cases is, respectively, set as 32 and 12 for battery electrode porosity prediction and analysis case.

Parameter Sensitivity Analysis
For the sensitivity analysis of battery electrode porosity, after calculating the Gini index values of MC, StLR, CG, and Vis, the corresponding importance ranking is visualized in Fig. 12. Quantitatively, the quantified importance values of StLR and Vis are 0.051 and 0.049, which are higher than that from other parameters. This implies that StLR and Vis are the two most important process parameters to determine battery electrode porosity. In contrast, MC shows the minimum Gini index with 0.032, which means that it provides the lowest effects on the classification/prediction of battery electrode porosity.
To quantify the correlations of each parameter pair for porosity case, the PMOAs of all pairs are calculated and visualized with a heat map matrix, as illustrated in Fig. 13. Obviously, the PMOA of MC and StLR pair gives the largest value around 0.9, indicating that there exists relatively strong correlations between these two process parameters. This result is expected as the mass ratio of slurry solid and mass present strong correlations with the electrode porosity in theory. For other parameter pairs, their PMOAs are all lower than 0.6, indicating that no strong correlations of these parameter pairs exist in determining the battery electrode porosity.

Classification Performance Evaluation
After using all three well-trained tree boosting-based models to forecast/classify battery electrode porosity, their relevant confusion matrices and microF1 values are illustrated in  Table 2, respectively. Quantitatively, AdaBoostbased model presents the worst classification result with 71.2% microF1 , while TotalBoost-based model achieves the best result of 74.1% microF1 , which is 2.1% better than that of LPBoost. Figure 15 illustrates the ROC curves of battery electrode porosity classification results through various tree-based models. It can be noted that the AUC values of all models are higher than 0.9. Quantitatively, AUC of TotalBoost-based model presents the largest value with 0.94, which is 2.2% and 3.3% larger than that of LPBoost-based model and AdaBoost-based model, respectively. Therefore, the proposed tree boosting-based interpretable machine learning framework is able to forecast/classify the battery electrode porosity with satisfactory AUC values at the early mixing and coating stages, while TotalBoost-based model also shows the most competent performance among these three adopted boosting techniques. Besides, it can be noticed that the prediction performance of "high" label gets worse than others for both electrode mass loading and porosity cases. This is mainly because the natures of "high" and "very high" classes determined by battery manufacturer are similar, while some observations with "high" label are classified into "very high" class.

Comparisons with Other Approaches
To further evaluate the performance of designed boostingtree-based models, another three widely utilized existing approaches including the decision tree (DT), k-nearest neighbors (KNN), and support vector machine (SVM) are compared here. Specifically, DT is a single tree. KNN is a typical instance learning-based approach relying on the instance for classification. SVM is a kernel-based classification method through mapping inputs to high-dimensional space [35]. Without the loss of generality, five-fold crossvalidation is utilized for these comparisons, and their performance indicators for both electrode mass loading and porosity are illustrated in Table 3. It can be noticed that DT presents the worst results, while SVM gives the best results of 89.3% microF1 and 68.3% microF1 for electrode mass loading and porosity cases, respectively. However, all these results are worse than boosting-tree-based models. In this context, due to the benefits of boosting solution, the proposed framework provides competent predictions of battery electrode mass loading and porosity.

Further Discussions
According to the predictions of both electrode mass loading and porosity, TotalBoost-based model presents the best results while AdaBoost-base model is the worst one for both these two cases. This is mainly due to the fact that compared with AdaBoost, both LPBoost and TotalBoost have the abilities to automatically adjust weights. Here the TotalBoost is also able to realize the classification by maximizing the minimal margin, further generalizing well for battery electrode property predictions. The statistical metrics obtained from the designed tree boosting model-based framework presents extensive connection to the battery production management strategy. For example, as battery manufacturing line owns numerous strongly-coupled feature parameters, monitoring all these feature parameters all the time is inefficient and would cause large cost. The battery manufacturer can utilize the feature importance and correlation information from the proposed approach to readjust the monitored features and better understand dependency among these features. Moreover, according to the prediction results of electrode properties, battery manufacturer is able to reoptimize related parameters to improve the qualities of battery product at early manufacturing stage, further enhancing battery manufacturing efficiency.

Conclusions
Electrode is a crucial part for determining the performance of batteries and related automotive applications. In this study, an effective tree boosting-based interpretable machine learning framework is designed for effective sensitivity analysis of process parameters and forecast/classify the battery electrode mass loading and porosity for the first time. Some conclusions can be drawn as follows: (1) StLR is a key process parameter to determine both electrode mass loading and porosity. CG is important for electrode mass loading while Vis is crucial for electrode porosity case.
(2) With the largest PMOA value, there is a relatively strong correlation between MC and StLR pair for both electrode mass loading and porosity cases. (3) Electrode mass loading can be well captured by using process parameters of MC, StLR, CG, and Vis, while more other process parameters should be considered to  improve the quality classification of battery electrode porosity.
Due to the advantage of data-driven nature and sensitivity analysis ability, the designed tree boosting modelbased framework can be utilized to analyze more battery manufacturing parameters when related data are available, further benefitting the understanding of battery electrode properties and wider applications of battery-based automotive applications.