Data Science-Based Battery Manufacturing Management

Liu, Kailong; Wang, Yujie; Lai, Xin

doi:10.1007/978-3-031-01340-9_3

Kailong Liu⁴,
Yujie Wang⁵ &
Xin Lai⁶

Part of the book series: Green Energy and Technology ((GREEN))

5606 Accesses

Abstract

This chapter focuses on the data science technologies for battery manufacturing management, which is a key process in the early lifespan of battery. As a complicated and long process, the battery manufacturing line generally consists of numerous intermediate stages involving strongly coupled interdependency, which would directly determine the performance of the manufactured battery. In this context, the in-depth exploration and management of different manufacturing parameters, variables, their correlation as well as effect towards the resulted property of manufactured intermediate products or final battery performance is crucial but still remains a difficult challenge. Recent advancements in data-driven analytic and related machine learning strategies raised interest in data science methods to perform effective and reasonable management of battery manufacturing.

You have full access to this open access chapter, Download chapter PDF

This chapter focuses on the data science technologies for battery manufacturing management, which is a key process in the early lifespan of battery. As a complicated and long process, the battery manufacturing line generally consists of numerous intermediate stages involving strongly coupled interdependency, which would directly determine the performance of the manufactured battery. In this context, the in-depth exploration and management of different manufacturing parameters, variables, their correlation as well as effect towards the resulted property of manufactured intermediate products or final battery performance is crucial but still remains a difficult challenge. Recent advancements in data-driven analytic and related machine learning strategies raised interest in data science methods to perform effective and reasonable management of battery manufacturing.

To give a systematic description of how to develop data science methods to benefit battery manufacturing management, an introduction is first given to dividing battery manufacturing into two main parts including battery electrode manufacturing and battery cell manufacturing. Then the data science framework and related machine learning tools for battery manufacturing management are described in detail. In addition, for both battery electrode manufacturing and cell manufacturing, two case studies of deriving proper data science methods to benefit their management are presented and discussed, respectively. This chapter would inform insights into the feasible data science methods with interpretability for the effective classification of battery product quality, prediction of manufactured battery performance, and sensitivity analyses of different manufacturing parameters of interest, further promoting smart battery manufacturing management.

3.1 Overview of Battery Manufacturing

As the first key stage of battery full-lifespan, the performance of Li-ion batteries would be directly and highly influenced by their manufacturing process, which significantly hinders the improvement of battery technologies. Suitable management of battery manufacturing plays a pivotal role in developing clean and efficient battery-based energy storage systems, which is also a key factor to secure tangible economic payback and to improve the efficiency of large-scale clean energy applications [1]. In this context, efforts are urgently required to fully understand the specific production stages, the intermediate products as well as the production parameters within a battery manufacturing chain [2].

Unfortunately, battery manufacturing chain contains numerous intermediate stages with lots of intermediate products, parameters, and impact variables [1]. Figure 3.1 illustrates the general processes of Li-ion battery manufacturing, which can be mainly divided into three parts including the battery material preparation, electrode manufacturing, and battery cell manufacturing. In general, for material preparation part, components to produce battery such as the active material, electrode additive, polymeric binder would be prepared based on the requirements of different types of Li-ion battery. These components and related formulations must be carefully selected as they significantly affect the manufactured electrode qualities such as electronic conductivity and thickness. For electrode manufacturing part, after preparing suitable component materials, these materials will be mixed to generate slurry. Then the slurry will be coated onto the surface of metal foil, followed by a drying stage in an oven with the preset temperature and then a calendaring stage to evaporate the residual solvent. For battery cell manufacturing part, the manufactured electrode would be cut into various sizes and then assembled. Then the electrolyte will be filled and the battery cell will be sealed, followed by the forming and testing steps to finish cell manufacturing.

It is worth mentioning that due to the complexity and strong-coupled interdependencies within each battery manufacturing part, the multiple correlations among feature variables of intermediate products and control parameters are still difficult to model. As the whole battery manufacturing chain consists of a number of chemical, mechanical as well as electrical operations and would generate over 600 influencing parameters or variables, the analysis of feature variables in battery manufacturing requires deep expert experiences and specialized equipment, which still mainly relies on the trial and error [3]. Therefore, in order to achieve smarter and cleaner battery production, advanced data science strategies to better quantify feature variables and select key feature items for predicting battery electrode properties are urgently needed.

With the rapid development of artificial intelligence and machine learning technologies, data-driven strategies have become popular in the field of battery management. A range of data-driven approaches have been designed for battery applications [4, 5]. Overall, by designing suitable data-driven models, it is expected that smarter and more efficient management of Li-ion batteries can be achieved. However, these researches primarily focus on the battery macroscopic performance without taking the battery intermediate properties in the production process into account. It should be noted that the battery production chain also generates a large amount of data and plays a more direct role in determining the battery performance, designing an effective data-driven approach to quantify and predict battery intermediate features is therefore also crucial for boosting the development of cleaner production [6].

In contrast to battery management where fruitful solutions are available, fewer reports have been found so far on using advanced machine learning technologies to improve battery production [7]. Among limited literature on battery productions (e.g. monitoring, adjustment, and control), developing a proper data-driven-based model to analyse feature variables and predict intermediate product properties is a hotspot. For instance, a data-driven approach was proposed in [8] to analyse the failure modes and parametric effects, which contributes to the improvement of battery production chain control. Based on the cross-industry standard process (CRISP), Schnell et al. [9] designed a linear model and a neural network model to identify the process dependencies and forecast the battery production properties. Turetskyy et al. [3] utilized the decision tree techniques to analyse feature importance and forecast the maximum capacity of battery. A multi-variate data-driven model was designed in [10] to discover proper quality gates for predicting the manufactured battery properties. Based upon the statistical analyses of fluctuations in battery production, the influence of these fluctuations on manufactured battery capacity is evaluated in [11]. In [12], several 2D graphs produced by three conventional machine learning classification models are used to analyse the dependencies of battery production features.

Based upon the aforementioned works on the data-driven modelling of battery production, the main research focuses of data science-based battery manufacturing management can be divided into two parts including data collection as well as process analysis and property prediction, as illustrated in Fig. 3.2.

Data collection: Data collection is the first main stage for data science-based management of battery manufacturing. This stage would focus on the exploration of battery manufacturing chain, the collection and the management of original battery manufacturing data into manual data source or automatic data source. The manual data source usually represents the intermediate manufacturing data that need to be collected into database manually without an available interface. In contrast, for the automatic data collection, with the booming digitization equipment for battery manufacturing, the data could be automatically collected into database. Obviously, automatic data collection is capable of saving time and effort, and also guarantee the reliability of collected battery manufacturing data. In this context, some advanced digitization systems such as traceability systems [13, 14] are worth developing, further popularizing the automatic data collection for the data science-based management of battery manufacturing.

Process analysis and property prediction: Here the intermediate process analysis mainly focuses on understanding the nature of the battery manufacturing process through analysing the importance and correlation of manufacturing parameters based on suitable data science tools. After collecting battery manufacturing data, these data would be well preprocessed to generate suitable data matrix for exploration [15]. Then the state-of-the-art machine learning tools would be used to derive proper data science models to analyse the sensitivity of interested parameter items within battery manufacturing chain [16, 17]. In general, both process parameters and intermediate product variables of each battery manufacturing step would be set as the explored targets. Due to battery manufacturing actually belonging to a step-by-step operational line, the outputs from the previous manufacturing stage could be utilized as the inputs of the next manufacturing stage for sensitivity analysis. After visualizing the analysed results by heat map or human-interface graphs, battery manufacturers can get useful conclusions and new ideas to readjust manufacturing parameters and optimize the management of battery manufacturing. On the other hand, battery cell properties prediction mainly focuses on developing suitable data science models to perform effective prognostics for both battery product performance and other manufacturing targets at the battery early manufacturing stage. Here two properties need to be considered. The first one is the manufactured battery cell performance such as its capacity [18], power or energy densities, service life. Another one is battery manufacturing goals such as the reduced cost and energy loss, the avoided manufacturing waste, and the increased battery yield.

3.2 Data Science Application of Battery Manufacturing Management

For the data science applications of battery manufacturing management, there are two main crucial things should be carefully considered. One is the utilized framework of designing data science-based method to perform analysis or predictions within battery manufacturing chain and another is the machine learning solutions to design related data science model.

3.2.1 Data Science Framework for Battery Manufacturing Management

First, the data science-based framework would significantly influence the efficiency and results of battery manufacturing analyses. In order to efficiently analyse manufacturing data from the whole battery manufacturing chain including the material preparations, electrode and cell manufacturing, a classical data science-based framework named the cross-industry standard process for data-mining (CRISP-DM) is widely utilized [3].

Figure 3.3 details the six steps of CRISP-DM framework, which includes the business understanding step, data understanding step, data preparation step, modelling step, evaluation step, and deployment step. To be specific, for the business understanding step, after well exploring the battery manufacturing chain of interest, the requirement, target and restriction of data science activities could be first defined from the business perspective. Then in the data understanding step, through using some direct data analytic and visualization methods such as heat map and scatter plot, the raw data from battery manufacturing chain is initially investigated for further deriving newly required data and quality. Besides, the obtained observations from data understanding stage could be adopted to readjust back data science target of the business understanding stage. Then the next data preparation step focuses on the preparation of suitable dataset extracted from battery manufacturing data for modelling activities [19]. This stage would be significantly time-consuming but also very important as the quality of battery manufacturing data would highly affect the performance of model training. In this context, several data preprocessing strategies such as curation, formatting, integration, and standardization would be adopted for generating high-quality manufacturing data. After that, suitable machine learning tools need to be carefully selected for deriving data science models based on different data analysis goals. In this stage, data science model would be generally optimized by readjusting its input form and updating both model structure and parameters set to achieve satisfactory analyses or predictions for battery manufacturing. Then in the evaluation stage, the modelling results especially for the sensitivity analyses of battery manufacturing variables and the predicted cell properties would be further explored with the predefined data science target, which would result in the new data science goals of battery manufacturing. Here the obtained results as well as conclusions can be visualized directly with suitable human interpretable graphs. These information could help battery manufacturers to get useful information and to create new evaluation plans for analysing battery manufacturing process. Based upon these observations obtained from data science tools, the improved solution or further evaluation could be proposed to optimize the intermediate stages for achieving smarter battery manufacturing management.

3.2.2 Machine Learning Tool

Apart from data science framework, machine learning tool is another crucial element for achieving effective data science-based battery manufacturing management. As there are numerous existing machine learning tools, the key issue here becomes how to select a suitable one to provide reliable data analysis results for different data science applications [20]. The machine learning tools for data science-based battery manufacturing management generally require the information of process parameters or variables from battery manufacturing chain as the inputs to predict the properties of both intermediate products and final manufactured batteries.

To well reflect the nature of machine learning tools in the management of battery manufacturing, a classification form is utilized to divide these state-of-the-art machine learning tools into non-interpretable and interpretable ones. For non-interpretable tools, just the predicted point result is given without obtaining information of importance and correlations of manufacturing parameters of interest. Yet, a key aspect of data analyses in battery manufacturing chain is not only generating the predicted property value but also providing the sensitivity analyses of the investigated process parameters or variables. In light of this, the interpretable machine learning tools are gaining much more attention for the smart management of battery manufacturing.

It should be known that the selection of suitable machine learning tools for data science-based battery manufacturing management is a multifaceted problem, depending on the analysis target, the data size and the process objective. Table 3.1 systematically summarizes and compared different classical machine learning tools including the linear model, quadratic model, support vector machine (SVM), neural network (ANN), decision tree (DT), random forest (RF), and others to predict battery product properties and perform sensitivity analyses for the applications of battery manufacturing chain. It can be concluded that none of a machine learning tool could be regarded as the one-size-fits-all solution; instead, there is the inherent trade-off between computational effort and the performance of corresponding predictions as well as sensitivity analyses. Moreover, some machine learning tools could only provide a single prediction point with the regression or classification form. Ideally, however, the reliable sensitivity analyses for capturing the importance and correlation of manufacturing parameters, is becoming a rigid requirement because of a large number of process parameters or variables are existed in the battery manufacturing chain and some of them are strongly coupled with each other [21]. In addition, the interpretable machine learning tools such as DT and RF present reasonable abilities for providing parameter sensitivity analyses, giving predicted point results and quantifying the parameter importance as well as correlations within battery manufacturing chain. Due to this merit, machine learning tools with strong interpretability are preferable as these quantified sensitivity analysis results can well benefit battery manufacturers [22]. However, data science applications based on the interpretable machine learning tool for battery manufacturing management is still in their infancy. Most existing researchers test their interpretable data science models on their own manufacturing data, which will call into question the generalization of them in other battery manufacturing cases where the operation is significantly different. It is thus suggested to improve these interpretable machine learning tools by validating related data science model for more complex battery manufacturing cases. In addition, most works simply adopt these tools to analyse battery manufacturing data without an in-depth optimization of their performance. As the performance of machine learning-based data science model would be also significantly influenced by its structure and parameters. Advanced model optimization solution needs to be explored for further improving model performance in the field of battery manufacturing management.

Table 3.1 Systematical comparison of typical machine learning tools for battery manufacturing management

Full size table

3.3 Battery Electrode Manufacturing

3.3.1 Overview of Battery Electrode Manufacturing

Electrode is a key component of battery. The electrode manufacturing of Li-ion batteries belongs to a highly complex and long process that involves many disciplines such as electrical, chemical, and mechanical engineering. Figure 3.4 systematically illustrates several key steps for battery electrode manufacturing.

According to Fig. 3.4, after preparing proper materials, battery electrode manufacturing generally consists of several individual processes including mixing, coating, drying as well as calendaring. For the materials of manufacturing Li-ion battery electrode, active material, conductive additive, solvent and binder are generally required, as illustrated in Fig. 3.5. To be specific, LiFePO₄ (LFP) and Li₄Ti₅O₁₂ (LTO) are widely used as the active material due to the fact that LFP has the merits of being non-toxic and adaptable to large current rates and high temperatures, while LTO is capable of mitigating the irreversible formations of SEI as well as dendrite. Based on this, the manufactured battery can provide high power and long cycling behaviours. For electrode additives, conductive fillers including the carbon black and carbon nanofiber (CNF) are needed due to the intrinsic electronic conductivity of solo active material which is usually insufficient. In addition, polymeric bind also plays a vital role in providing mechanical cohesion [2]. In reality, three types of binders including the polyvinylidene-fluoride (PVDF), polyethylene co-ethyl-acrylate co-maleic-anhydride (TPE) and hydrogenated-nitrile-butadiene-rubber (HNBR) are generally adopted owing to their exceptional chemical stability and reliable binding properties.

After preparing proper materials, these materials will be mixed within the soft blender to produce the homogenous slurry during the mixing step. It should be known that the slurry rheological property such as active material mass content, solid-to-liquid ratio and viscosity play pivotal roles in the further coating and drying stages [23]. During coating, the slurry will be coated on the current collector made of mental foil by a coating machine. In general, copper foil is used for anode while aluminium foil is utilized for cathode. Here the coating speed is usually set as a constant value while the comma gap of coating machine would be adjusted to generate shear force that significantly affects the thickness of coating products. Defects such as the pinhole, agglomerate, and non-uniformity may happen during the mixing process and coating process, which would further decrease the electrochemical performance of the final battery such as its maximum capacity [24]. Then the wet coating product will be dried within an oven with the preset drying temperature and speed, which would highly affect the electrode’s mechanical as well as electrochemical properties [25]. After that, a calendaring process is performed through the mechanical pressure generated from two cylindrical rolls to further evaporate the residual solvent of the dried coating product, further benefiting the energy density, adhesion of manufactured electrodes.

It should be noted that all these individual processes (mixing, coating, drying, calendaring) generally need specific equipment such as the mixer, coating, and dryer. Numerous manufacturing parameters and variables would be generated during these processes. Some parameters particular from battery electrode early manufacturing stages such as mixing and coating are important to determine the manufactured electrode property and must be well analysed. In this context, data science technology could first benefit the exploration of these parameters and variables involved in producing battery electrode. Furthermore, according to the well-trained data science model, the properties of manufactured electrode can be predicted without making the cells. This could be an effective and intelligent solution to guide the smart management of battery manufacturing, further benefitting the battery manufacturers to optimize their production process.

3.3.2 Case 1: Battery Electrode Mass Loading Prediction with GPR

To illustrate how to design a data science framework to benefit battery electrode manufacturing, a data science case study through deriving Gaussian process regression (GPR)-based framework to predict battery electrode mass loading is given [26]. Through incorporating automatic relevance determination (ARD) kernel, this GPR framework is able to directly quantify the importance of four intermediate manufacturing parameters and analyse their influences on the prediction of battery electrode mass loading.

Deriving from the Bayesian theory, GPR can be seen as a random process to undertake the nonparametric regression with the Gaussian processes [27]. That is, for any inputs, the corresponding probability distribution over function $f\left( x \right)$ follows the Gaussian distribution as:

$$ f\left( x \right) \sim {\text{GPR}}\left( {m\left( x \right),k\left( {x,x^{\prime}} \right)} \right) $$

(3.1)

where $m\left( x \right)$ and $k\left( {x,x^{\prime}} \right)$ denote the mean and covariance functions respectively, and expressed by:

$$ \left\{ {\begin{array}{*{20}l} {m\left( x \right) = E\left( {f\left( x \right)} \right)} \hfill \\ {k\left( {x,x^{\prime}} \right) = E\left[ {\left( {m\left( x \right) - f\left( {x^{\prime}} \right)} \right)\left( {m\left( x \right) - f\left( {x^{\prime}} \right)} \right)} \right]} \hfill \\ \end{array} } \right. $$

(3.2)

Here $E()$ represents the expectation value. It is worth noting that in practice, $m\left( x \right)$ is generally set to be zero for simplifying calculation process [27]. $k\left( {x,x^{\prime}} \right)$ is also named as the kernel function to explain the relevance degree between a target observation of the training data set and the predicted output based on the similarity of the respective inputs.

In a regression issue, the prior distribution of outputs ${\varvec{y}}$ can be expressed by:

$$ {\varvec{y}} \sim N\left( {0,k\left( {x,x^{\prime}} \right) + \sigma_{n}^{2} I_{n} } \right) $$

(3.3)

$N()$ indicates a normal distribution. $\sigma_{n}$ is the noise term. Supposing there exists a same Gaussian distribution between the testing set $x^{\prime}$ and training set $x$, the predicted outputs $y^{\prime}$ would follow a joint prior distribution with the training output $y$ as:

$$ \left[ {\begin{array}{*{20}c} y \\ {y^{\prime}} \\ \end{array} } \right] \sim N\left( {0,\left[ {\begin{array}{*{20}c} {k\left( {x,x} \right) + \sigma_{n}^{2} I_{n} } & {k\left( {x,x^{\prime}} \right)} \\ {k\left( {x,x^{\prime}} \right)^{T} } & {k\left( {x^{\prime},x^{\prime}} \right)} \\ \end{array} } \right]} \right) $$

(3.4)

where $k\left( {x,x} \right)$, $k\left( {x^{\prime},x^{\prime}} \right)$, and $k\left( {x,x^{\prime}} \right)$ represent the covariance matrices among inputs from training set, testing set, as well as training and testing sets, respectively.

In order to guarantee the performance of GPR, some hyperparameters $\theta$ existing in the covariance function require to be optimized by the $n$ points in the training process. One efficient optimization solution is to minimize the negative log marginal likelihood $L\left( \theta \right)$ [28] as:

$$ \left\{ {\begin{array}{*{20}l} {L\left( \theta \right) = \frac{1}{2}\log \left[ {\det \lambda \left( {\varvec{\theta}} \right)} \right] + \frac{1}{2}y^{T} \lambda^{ - 1} \left( {\varvec{\theta}} \right)y + \frac{n}{2}\log \left( {2\pi } \right)} \hfill \\ {\lambda \left( {\varvec{\theta}} \right) = k\left( {\varvec{\theta}} \right) + \sigma_{n}^{2} I_{n} } \hfill \\ \end{array} } \right. $$

(3.5)

After optimizing the hyperparameters of GPR, the predicted output $y^{\prime}$ can be obtained at dataset $x^{\prime}$ through calculating the corresponding conditional distribution $p\left( {y^{\prime}{|}x^{\prime},x,y} \right)$ as:

$$ p\left( {y^{\prime}{|}x^{\prime},x,y} \right) \sim N\left( {y^{\prime}{|}\overline{y}^{\prime},{\text{cov}}\left( {y^{\prime}} \right)} \right) $$

(3.6)

with

$$ \left\{ {\begin{array}{*{20}l} {\overline{y}^{\prime } = k\left( {x,x^{\prime } } \right)^{T} \left[ {k\left( {x,x} \right) + \sigma_{n}^{2} I_{n} } \right]^{ - 1} y} \hfill \\ {{\text{cov}}\left( {y^{\prime } } \right) = k\left( {x^{\prime } ,x^{\prime } } \right) - k\left( {x,x^{\prime } } \right)^{T} \left[ {k\left( {x,x} \right) + \sigma_{n}^{2} I_{n} } \right]^{ - 1} k\left( {x,x^{\prime } } \right)} \hfill \\ \end{array} } \right. $$

where $\overline{y}^{\prime}$ stands for the corresponding mean values of prediction. ${\text{cov}}\left( {y^{\prime}} \right)$ denotes a variance matrix to reflect the uncertainty range of these predictions. More details on these equations of GPR can be found in [27].

3.3.2.1 ARD Kernel for Feature Selection

As two early stages of battery electrode manufacturing chain, mixing and coating could play pivotal roles in determining the mass loading of battery electrode, further affecting the performance of final manufactured cell. In this case study, a GPR-based data science model is derived to well predict battery electrode mass loading and quantify the importance weights of four battery manufacturing parameters of interest, as illustrated in Fig. 3.6. Specifically, these four parameters include three mixing feature variables: mass content (MC), solid-to-liquid ratio (STLR), slurry viscosity (V), and one coating process parameter: comma gap (CG). GPR is integrated with four ARD structure-based kernel functions, which are derived as follows:

$$ k_{{{\text{ARDEX}}}} \left( {i,i^{\prime}} \right){ } = \sigma_{{{\text{EX}}}}^{2} \exp \left[ { - \left( {\begin{array}{*{20}c} {\frac{{\left\| {i_{{{\text{MC}}}} - i^{\prime}_{{{\text{MC}}}} } \right\|}}{{\sigma_{{{\text{MC}}}} }} + \frac{{\left\| {i_{{{\text{STLR}}}} - i^{\prime}_{{{\text{STLR}}}} } \right\|}}{{\sigma_{{{\text{STLR}}}} }}} \\ { + \frac{{\left\| {i_{{{\text{CG}}}} - i^{\prime}_{{{\text{CG}}}} } \right\|}}{{\sigma_{{{\text{CG}}}} }} + \frac{{\left\| {i_{V} - i^{\prime}_{V} } \right\|}}{{\sigma_{{\text{V}}} }}} \\ \end{array} } \right)} \right] $$

(3.7)

$$ k_{{{\text{ARDSE}}}} \left( {i,i^{\prime}} \right){ } = \sigma_{{{\text{SE}}}}^{2} \exp \left[ { - \frac{1}{2}\left( {\begin{array}{*{20}c} {\frac{{\left\| {i_{{{\text{MC}}}} - i^{\prime}_{{{\text{MC}}}} } \right\|^{2} }}{{\sigma_{{{\text{MC}}}}^{2} }} + \frac{{\left\| {i_{{{\text{STLR}}}} - i^{\prime}_{{{\text{STLR}}}} } \right\|^{2} }}{{\sigma_{{{\text{STLR}}}}^{2} }}} \\ { + \frac{{\left\| {i_{{{\text{CG}}}} - i^{\prime}_{{{\text{CG}}}} } \right\|^{2} }}{{\sigma_{{{\text{CG}}}}^{2} }} + \frac{{\left\| {i_{V} - i^{\prime}_{V} } \right\|^{2} }}{{\sigma_{V}^{2} }}} \\ \end{array} } \right)} \right] $$

(3.8)

$$ \left\{ {\begin{array}{*{20}l} {k_{{{\text{ARDM}}3/2}} \left( {i,i^{\prime}} \right) = \sigma_{{{\text{M}}3/2}}^{2} \left( {1 + \sqrt 3 r} \right)\exp \left( { - \sqrt 3 r} \right)} \hfill \\ {r = \sqrt {\frac{{\left\| {i_{{{\text{MC}}}} - i^{\prime}_{{{\text{MC}}}} } \right\|^{2} }}{{\sigma_{{{\text{MC}}}}^{2} }} + \frac{{\left\| {i_{{{\text{STLR}}}} - i^{\prime}_{{{\text{STLR}}}} } \right\|^{2} }}{{\sigma_{{{\text{STLR}}}}^{2} }} + \frac{{\left\| {i_{{{\text{CG}}}} - i^{\prime}_{{{\text{CG}}}} } \right\|^{2} }}{{\sigma_{{{\text{CG}}}}^{2} }} + \frac{{\left\| {i_{V} - i^{\prime}_{V} } \right\|^{2} }}{{\sigma_{V}^{2} }}} } \hfill \\ \end{array} } \right. $$

(3.9)

$$ \left\{ {\begin{array}{*{20}l} {k_{{{\text{ARDM}}5/2}} \left( {i,i^{\prime}} \right) = \sigma_{{{\text{M}}5/2}}^{2} \left( {1 + \sqrt 5 r + \frac{5}{3}r^{2} } \right)\exp \left( { - \sqrt 5 r} \right)} \hfill \\ {r = \sqrt {\frac{{\left\| {i_{{{\text{MC}}}} - i^{\prime}_{{{\text{MC}}}} } \right\|^{2} }}{{\sigma_{{{\text{MC}}}}^{2} }} + \frac{{\left\| {i_{{{\text{STLR}}}} - i^{\prime}_{{{\text{STLR}}}} } \right\|^{2} }}{{\sigma_{{{\text{STLR}}}}^{2} }} + \frac{{\left\| {i_{{{\text{CG}}}} - i^{\prime}_{{{\text{CG}}}} } \right\|^{2} }}{{\sigma_{{{\text{CG}}}}^{2} }} + \frac{{\left\| {i_{V} - i^{\prime}_{V} } \right\|^{2} }}{{\sigma_{V}^{2} }}} } \hfill \\ \end{array} } \right. $$

(3.10)

where all these ARD structure-based kernels have the same series of input items as $i = \left\{ {\begin{array}{*{20}c} {i_{{{\text{MC}}}} ,} & {i_{{{\text{STLR}}}} ,} & {\begin{array}{*{20}c} {i_{{{\text{CG}}}} ,} & {i_{V} } \\ \end{array} } \\ \end{array} } \right\}$. $\sigma_{{{\text{MC}}}}$, $\sigma_{{{\text{STLR}}}}$, $\sigma_{{{\text{CG}}}}$, and $\sigma_{V}$ are four hyperparameters to reflect the relevancies and importance of manufacturing parameters including MC, STLR, CG, and viscosity, respectively. According to these defined ARD kernels, the workflow of designing ARD kernel-based GPR model for feature selection and prediction of manufactured battery electrode mass loading is detailed in Table 3.2.

Table 3.2 Detailed workflow of designing ARD kernel-based GPR for feature selection and prediction of manufactured battery electrode mass loading, reprinted from [26], with permission from Elsevier

Full size table

It should be known that taking the exponential of the negative learned $\theta_{{{\text{hp}}}}$ and normalizing $W$ is a common solution to make the quantified importance weights of manufacturing parameters become more convenient for comparison. Other weighting strategies such as the inverse of $\theta_{{{\text{hp}}}}$ could lead the final quantified weight values slightly change, but it would not influence the importance ranking trend of these parameters. The manufacturing parameter item with higher ranking (larger normalized weight value) means this parameter is more crucial than others in predicting battery electrode mass loading. In light of this, all four parameter weights are uniformly quantified in this case study. Following the workflow, the weights of input items including the MC, STLR, viscosity from mixing and the CG from coating could be directly quantified to reflect their importance. Then the reliable feature selections can be carried out based upon these quantified importance weights. Moreover, the designed GPR model with ARD kernel structure is able to generate an explanatory subset of features by setting different hyperparameters for all inputted variable items, further benefitting the improvement of performance and generalization ability of battery electrode mass loading prediction.

3.3.2.2 Results and Discussions

To well analyse and evaluate the feature selection and regression performance of GPR with various ARD kernels, two tests including one with all four manufacturing parameter items and another with reduced manufacturing parameter items are carried out.

First, to quantify the importance weights of interested manufacturing parameters on the prediction of manufactured battery electrode mass loading, all parameter items including MC, STLR, CG, and viscosity are adopted as the inputs for GPR models. The prediction performance of all four ARD kernels is evaluated using fourfold cross-validation. Figure 3.7 and Table 3.3 show these prediction results and the related performance indicators for these four GPRs with different ARD kernels, respectively.

Table 3.3 Performance indicators for electrode mass loading prediction by using all items, reprinted from [26], with permission from Elsevier

Full size table

Obviously, most observations in Fig. 3.7 well match the outputs from four GPR models, indicating that all these four ARD kernel functions could provide satisfactory prediction results for the manufactured battery electrode mass loading. Quantitatively, due to the simplest structure, the training phase of GPR with ARDEX kernel can be finished within 13.547 s, while its prediction accuracy is the worst with the RMSE of 1.177 mg/cm². In the contrary, GPR with the ARDMatern5/2 kernel leads to the longest training time of 14.487^s (6.9% increase), but its RMSE becomes the lowest one with 1.084 mg/cm² (7.9% decrease). However, the training time of all these kernel functions is within 14.5 s, implying that the computational efforts of these four ARD kernels are all acceptable. In conclusion, after adopting all four manufacturing parameters as the input items for the ARD kernel-based GPR model, expected accuracy could be achieved for the manufactured battery electrode mass loading predictions.

To further reflect the deviations of electrode mass loading prediction results, the predicted versus actual plots (PVAPs) for GPRs with all four ARD kernel functions are shown in Fig. 3.8. In theory, for the observations on the left or right of PVAPs, the furthest from the average value could produce the most leverages and effectively pull the prediction line towards that observation. For a model with good prediction performance, the observations should get close to the perfect prediction line. According to Fig. 3.8, all observations can be clustered around the perfect prediction lines without obvious outliers, implying that GPRs with ARD kernel functions are able to provide satisfactory electrode mass loading prediction results with a few deviations for the case of using all manufacturing parameter items.

Next, based upon the well-optimized GPR models, the normalized importance weights from hyperparameters of all four ARD kernel functions are plotted in Fig. 3.9. Interestingly, although there exist differences among each feature item, the trend of feature importance weights is similar for all GPRs with four ARD kernel functions. Specifically, CG always presents the largest important weight, which is nearly twice as large as that of STLR. In the contrary, viscosity gives the smallest importance weight (nearly five times less than the CG). The STLR’s importance weights are slightly larger than MC’s for all four ARD kernel functions. This finding signifies that among these four electrode manufacturing parameters, CG is the most important item and must be selected for battery electrode mass loading prediction. STLR and MC are the second and third important parameter items. In contrast, viscosity makes the least contribution to predicting the mass loading of the manufactured battery electrode.

To further reflect the electrode mass loading prediction performance of ARD kernel functions, four GPRs with the related conventional kernel functions of EX, SE, Matern3/2 and Matern5/2 are utilized as the benchmarks for comparison purposes. It should be known that these convention kernels without ARD structure cannot directly quantify the importance weights of manufacturing parameters of interest as all four parameter items’ hyperparameters within kernel are same. Figure 3.10 and Table 3.4 illustrate the related electrode mass loading prediction results and corresponding performance indicator of using all four parameter items as inputs to the GPRs with conventional kernel functions based on fourfold cross-validation. It can be noted that EX kernel presents the worst results, while Matern3/2 and Matern5/2 give the better mass loading prediction results. Here the RMSEs of all conventional kernel functions become slightly worse than the related ARD kernels (5% increase). Therefore, apart from the advantage of explaining parameter importance, GPRs with ARD kernels also present competent performance in the mass loading prediction of the manufactured battery electrode.

Table 3.4 Performance indicators for kernel comparisons by using all parameter items, reprinted from [26], with permission from Elsevier

Full size table

3.3.3 Case 2: Battery Electrode Property Classification with RF

In this study, another data science-based solution through deriving an improved random forest (RF) classification model is given to effectively classify electrode properties and quantify the feature importance and correlations among four early manufacturing parameters, as shown in Fig. 3.11 [29]. To be specific, the model inputs include slurry active material mass content (AMMC), solid-to-liquid ratio (StoLR) and viscosity from the mixing stage, and one process parameter of coating named comma gap (CG). The model output is the labelled classes of electrode mass loading or porosity.

3.3.3.1 RF Technique and Feature Analyses Solutions

As a typical ensemble learning solution, numerous individual decision trees (DTs) are integrated within RF, as illustrated in Fig. 3.12. In general, classification and regression tree (CART) is adopted as a DT of RF owing to its simplification and nonparametric behaviours. The main process for RF training is to produce various de-correlated DTs. To reduce its variance, an overlap sampling method called “bagging” is utilized here [30]. Additionally, to restrain correlations of these DTs, the best split of each node would be got by randomly choosing $m$ subset features from all $M$ features. In this context, DTs could be built without pruning, resulting in a relatively low computational effort. Besides, after adopting various bootstrap samples and node features, RF’s noise immunity is improved by averaging different DTs.

To effectively quantify the importance of battery manufacturing parameters of interest, two different types of feature importance (FI) including the unbiased FI and gain improvement FI are utilized in this study. Table 3.5 illustrates the detailed process to obtain the unbiased FI.

Table 3.5 Detailed process to obtain the unbiased FI

Full size table

Apart from the unbiased FI, another effective FI is obtained by summing the gain improvements of Gini impurity changes. For a classification, Gini impurity is used to reflect how well a potential split is in a specific node of DT [31]. The detailed process to calculate the gain improvement FI is described in Table 3.6.

Table 3.6 Detailed process to obtain Gain improvement FI

Full size table

In addition, quantifying the correlations of different manufacturing parameter pairs is also vital to understanding battery manufacturing. To achieve this, the predictive measure of association (PMOA) is designed. The basic idea of calculating PMOA is through the comparison of all potential splits with the optimal one which is observed during DT training. Then the best surrogate decision split could generate the maximum PMOA value to reflect their corresponding correlations. In this context, PMOA is able to reflect the similarity between various decision rules to split observations. Supposing $x_{e}$ and $x_{g}$ are two interested feature variables $\left( {e \ne g} \right)$, the PMOA between the optimal split $x_{e} < u$ and surrogate split $x_{g} < v$ can be calculated as:

$$ {\text{PMOA}}_{e,g} = \frac{{\min \left( {{\text{Pl}},\Pr } \right) - 1 + {\text{Pl}}_{e} l_{g} + \Pr_{e} r_{g} }}{{\min \left( {{\text{Pl}}, \Pr } \right)}} $$

(3.11)

where l and r denote the left and right children of node; Pl is the observation proportion of $x_{e} < u$ while Pr means the observation proportion of $x_{e} \ge u$; ${\text{Pl}}_{e} l_{g}$ represents the observation proportion of $x_{e} < u$ and $x_{g} < v$; ${\text{Pr}}_{e} r_{g}$ is the observation proportion of $x_{e} \ge u$ and $x_{g} \ge v$. PMOA should be less than 1, larger PMOA implies the related feature pair has higher correlations.

Figure 3.13 illustrates the data science framework of designing the RF-based model for the classification and sensitivity analysis of specific battery electrode manufacturing parameters. This framework can be divided into four main parts as follows:

Part 1. Data preprocess and RF-based model construction: After collecting relevant data from battery electrode manufacturing chain, the outliers are firstly removed and the class labels are set for outputs. In this study, both mass loading and porosity of manufactured battery electrode are set with five class labels. Then the inputs–output pairs as illustrated in Fig. 3.11 are generated to train all DTs within RF. In theory, RF classification model has two main hyperparameters, the amount of DTs (J) and the number of feature items during each split (m), need to be tuned. Some points should be considered for the tunning of these two hyperparameters: On the one hand, in theory, higher accuracy and generalization ability can be obtained when increasing J. However, too many DTs could also increase the computational burden of derived RF. On the other hand, both the performance and correlations of DTs within RF model could be affected by m. High value of m could benefit DT’s strength but also lead these DTs become more correlated. To obtain suitable hyperparameters in this study, an effective tunning approach called randomized search [32] is utilized.

Part 2. Feature importance analysis: in this part, to quantify the feature importance of all manufacturing parameters of interest and analyse their effects on the classification of electrode mass loading and porosity, two quantitative metrics including the unbiased FI and the gain improvement FI are adopted. Specifically, the unbiased FI is calculated based on the process in Table 3.5, while the gain improvement FI is obtained based on the process in Table 3.6.

Part 3. Feature correlations analysis: After quantifying the importance of mixing and coating parameters, the PMOA values of each parameter pair are calculated by Eq. (3.11) and visualized as a $M \times M$ heat map. Then the correlations between each two manufacturing parameters can be quantified by these PMOAs. Larger PMOAs theoretically indicate that higher correlations exist between parameter pairs. The PMOAs of two manufacturing parameters could be different in a heat map, relying on which manufacturing parameter causes DTs’ optimal spit firstly.

Part 4. RF classification model reconstruction: After the comparison of FI and the analysis of parameter correlations, the most important manufacturing parameters that affect battery electrode property classification results are selected. Then the RF could be reconstructed with reduced parameters for new electrode property classifications.

Following these steps, an effective RF model-based data science framework can be designed to not only analyse the importance and correlations of early manufacturing parameters from mixing and coating, but also well classify the mass loading and porosity of manufactured battery electrode into proper categories.

3.3.3.2 Electrode Mass Loading Analysis

Feature analyses: for the electrode mass loading classification, following the steps from Tables 3.5 and 3.6, both unbiased FI and gain improvement FI of all four feature variables could be quantified, as shown in Fig. 3.14. Interestingly, although the value levels between these two FIs are significantly different, a similar trend for all feature variables can be obtained. Obviously, CG presents much higher values for both unbiased FI with 4.78 and gain improvement FI with 0.037, indicating that this feature variable is the most important one. StoLR and AMMC provide the second and third larger values for both two types of FIs. The viscosity gives the smallest values of unbiased FI with 0.67 and gain improvement FI with 0.022, indicating that this feature contributes the least to classify manufactured electrode mass loading.

Next, the heat map to reflect the PMOAs of all feature pairs are created for evaluating the correlations among four features for electrode mass loading case, as illustrated in Fig. 3.15. Quantitatively, AMMC and StoLR achieve the largest correlations with the PMOA of 0.72. This correlation output is very useful as the obtained results are consistent with the observations from manufacturing experiments, but this study demonstrates how an RF-based data science framework can support the interpretation of correlations among interested manufacturing feature variables. This could benefit engineers to efficiently understand and manage their battery electrode manufacturing chain.

Electrode mass loading classification: To evaluate battery electrode mass loading classification results, a test by using the derived RF model with all features as inputs is carried out first. According to its CM in Fig. 3.16, a satisfactory classification accuracy rate with 90.2% is achieved. Quantitatively, the best classification results are classes “very high” and “very low” with 100% $P_{\rm{rate}}$, while the worst classification result is the “low” class with 72.7% $P_{\rm{rate}}$. This is mainly due to two observations being incorrectly classified as “very low” and one observation being classified as “medium” in such a case.

Next, to further evaluate the effects of each manufacturing feature on the classification of manufactured electrode mass loading, four different cases with various combinations of three feature items are tested. To be specific, Case 1 contains CG, AMMC and StoLR. Case 2 includes CG, AMMC and viscosity. Case 3 includes CG, StoLR, and viscosity. Case 4 is composed of AMMC, StoLR, and viscosity. The performance metrics of all these four cases are illustrated in Table 3.7. It can be seen that Case 1 presents the best classification result with 86.6% $macroP$, 89.8% $macroR$ and 90.0% $macroF1$, which are just 3.3, 1.9 and 0.1% less than those from the case of using all feature items. This implies that using CG, AMMC, and StoLR is sufficient for mass load classification. Interestingly, without involving CG, the performance metrics of Case 4 largely decrease, indicating that CG plays a significantly important role in electrode mass loading classification.

Table 3.7 Performance indicators for battery electrode mass loading classification, reprinted from [29], with permission from IEEE

Full size table

3.3.3.3 Electrode Porosity Analysis

The test of classifying manufactured electrode porosity is also carried out. In this test, the inputs are the same as those from the mass loading test, while the output here becomes electrode porosity.

Feature analyses: Fig. 3.17 illustrates the corresponding unbiased FI and gain improvement FI for electrode porosity classification. According to Fig. 3.17, StoLR and viscosity are the two most important feature items while AMMC is the worst one for classifying battery electrode porosity. Next, according to the association estimates of corresponding feature pairs in Fig. 3.18, the pair of AMMC and StoLR presents the highest PMOA of 0.84, indicating that these two early manufacturing parameters present strong potential correlations for classifying battery electrode porosity.

Electrode porosity classification: After using all four manufacturing parameters as inputs, its CM to reflect the classification results of manufactured battery electrode porosity is illustrated in Fig. 3.19. This test could provide a classification accuracy of 70.7%, which is mainly caused by some mismatched observations particular for the class label “high”. In comparison with the electrode mass loading case, it can be seen that the quality of electrode porosity classification cannot be fully determined by these four manufacturing parameters of interest.

Next, to further investigate the effects of these manufacturing parameters on electrode porosity classification, four tests with similar parameter combination cases as those from mass loading are compared here, while their performance metrics are illustrated in Table 3.8. Specifically, through adopting the three most important manufacturing parameters (StoLR, CG, and viscosity), Case 3 shows the best classification result with 59.4% $macroP$, 60.8% $macroR$, and 59.7% $macroF1$. However, the overall classification performance of electrode porosity is worse than the electrode mass loading case. This fact signifies that for electrode porosity classification, more other related manufacturing parameters should be considered to further enhance its classification performance.

Table 3.8 Performance indicators for battery electrode porosity classification, reprinted from [29], with permission from IEEE

Full size table

In summary, the electrode mass loading can be well classified based on these four manufacturing parameters with 90.1% $macroF1$ while CG plays a most pivotal role in this classification. This result is reasonable as both coating weight and thickness that would determine battery electrode mass loading are significantly affected by CG. For the classification result of electrode porosity, the $macroF1$ becomes just 66.4%, indicating that more other manufacturing parameters need to be considered for better classifying battery electrode porosity. This result is also expected as battery electrode porosity is also significantly affected by drying parameters such as temperature and pressure. Not surprisingly, AMMC and StoLR have large correlations for both mass loading and porosity. This is mainly caused by the ratio between slurry solid component and mass present direct relation with the active material property. In contrast, none direct relations are existed for other parameter pairs. In addition, active material mass content cannot directly influence battery electrode physical properties such as porosity, further causing the AMMC here to be a less important parameter. In this context, to further improve electrode porosity classification performance, more manufacturing parameters from drying and calendaring such as drying temperature, pressure, and calendaring speed are suggested to be considered.

3.4 Battery Cell Manufacturing

3.4.1 Overview of Battery Cell Manufacturing

Figure 3.20 illustrates the general steps for battery cell manufacturing, which mainly includes the processes of electrode cutting, cell assembly, electrolyte filling, forming, and testing. For battery cell manufacturing, the manufactured electrode would be first cut into different sizes for various battery types such as the coin cell, cylindrical cell, and pouch cell. There are usually two typical options including the die-cut and laser-beam cut to cut the manufactured electrode [33]. Both of them present merits and demerits. The die-cut is easier to result in the delamination of anode edge and the bending of current collector, while laser-beam cut could produce less damage to the electrodes. However, the aluminium spatter may promote the dendrite growth to further lead cell failure when laser-beam cut is utilized for cathode. For the process of cell assembly, the assembly of electrode and separator becomes a main variation source. The electrode’s misalignment would highly affect both charging/discharging capacity and cell ageing behaviours [34, 35]. Specifically, the overlapping area of anode and cathode would be decreased firstly, then the dendrite will initially grow by the non-uniform Li-ions depositing at the edges [36]. Moreover, the separator here would be selected based on different characteristics. For example, its weight and thickness would affect the manufactured cell’s gravimetric and volumetric performance, while its wettability and porosity would affect the electrolyte quantity and cell service life [37, 38].

After assembling the cell, the electrolyte filling and wetting would be the next steps and would also highly affect the final performance of the manufactured battery. Here the electrolyte is an ionic conductor between the active materials of electrode for ensuring ion exchange. The poorly wetted electrode should be avoided in this stage as they can result in an increased impedance and dendrite formation of the manufactured battery. Finally, as the two most time-consuming processes, the forming and testing of manufactured cells should be well designed and controlled. In theory, a stable SEI could be formed through consuming active lithium and electrolyte materials during the stage of battery formation [39]. This SEI layer could provide protections against the irreversible consumption of electrolyte and the damage to the particles of active materials [40]. To eliminate the generated gaseous product, the manufactured battery cell needs to be degassed and tested after resealing. Here the battery electrochemical test is customized for intended different applications. In principle, its current and voltage need to be accurately monitored to generate accurate information on manufactured cell capacity, cycling loss, energy, and power densities.

3.4.2 Case 1: Battery Cell Capacities Prediction with SVR

In this study, a data science-based model with support vector regression (SVR) is developed to describe how the variations in assembled battery half-cell properties including the thickness (μm), mass loading (g/m²) and porosity (%) affect the final manufactured battery capacities at different current rates (C/20, 1C and 2C) [41]. Figure 3.21 illustrates the schematic diagram of all parts and the predefined inputs–output matrix with 115 observations in total.

To predict manufactured battery cell capacities under different C-rates, an SVR-based data science model is developed. SVR belongs to a typical data science tool for both classification and regression [42]. To achieve reasonable prediction, the best regression hyperplane would be searched during the training stage of SVR. The hyperplane is determined by an orthogonal weight vector $\omega$ that could give wider margin of separations. Supposing the training dataset is noted as $ TD = \left( {X_{i} ,Y_{i} } \right),\;i = 1,2, \ldots ,l,\;X \in R^{m} $, while hyperplane is $\left( {\omega \cdot X_{i} + b} \right) = 0$. To ensure all observations can be predicted well, following constraints should be satisfied as:

$$ Y_{i} \left( {\omega \cdot X_{i} + b} \right) \ge 1,\quad i = 1,2, \ldots ,l $$

(3.12)

Then the process to maximize the regression margin is defined as:

$$ \left\{ {\begin{array}{*{20}l} {\left\| {\min } \right\|\;\omega_{2}^{2} /2} \hfill \\ {\begin{array}{*{20}c} {{\text{s.t.}}} & {Y_{i} \left( {\omega \cdot X_{i} + b} \right) \ge 1,\quad i = 1,2, \ldots ,l} \\ \end{array} } \hfill \\ \end{array} } \right. $$

(3.13)

After constructing Lagrange function, this process can be expressed by the Lagrange multiplier $\alpha_{i}$ as:

$$ \left\{ {\begin{array}{*{20}l} {\underbrace {\min }_{\alpha }\frac{1}{2}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\alpha_{i} \alpha_{j} Y_{i} Y_{j} } } \left( {X_{i} \cdot X_{j} } \right) - \sum\limits_{i = 1}^{N} {\alpha_{i} } } \hfill \\ {\sum\limits_{i = 1}^{N} {\alpha_{i} } Y_{i} = 0} \hfill \\ {\begin{array}{*{20}c} {\alpha_{i} \ge 0,} & {i = 1,2, \ldots ,N} \\ \end{array} } \hfill \\ \end{array} } \right. $$

(3.14)

Based upon Eq. (3.14), SVR is capable of not only guaranteeing the accuracy of regression, but also maximizing the blank ranges on all sides of hyperplane [43].

In order to improve the nonlinear prediction performance of SVR, kernel functions should be coupled within SVR. Specifically, through using proper kernels, raw data from the original space could be effectively transferred to a high-dimensional space, then the SVR-based regression model could be trained by using the data from this high-dimensional space with the linear classification approach. Supposing $\phi \left( e \right)$ is a function to map the input space to a new feature space, the kernel function can be expressed by:

$$ K\left( {e,g} \right) = \phi \left( e \right) \cdot \phi \left( g \right) $$

(3.15)

According to Eq. (3.14), the cost function to maximize the regression margin through involving the kernel functions becomes:

$$ W\left( \alpha \right) = \frac{1}{2}\frac{1}{2}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\alpha_{i} \alpha_{j} Y_{i} Y_{j} } } \left( {X_{i} \cdot X_{j} } \right) - \sum\limits_{i = 1}^{N} {\alpha_{i} } $$

(3.16)

Based upon the above discussions, kernel functions play key important roles in determining the prediction performance of SVR. It should be known that for different applications, various kernel functions would present different performances, which should be carefully selected. In this study, to well predict cell capacities of manufactured battery, an SVM-based data science model with a quadratic kernel is utilized.

After preparing the relevant battery manufacturing data, a test is carried out to predict capacities under other C-rate in between the maximum and minimum C-rates used for training purpose of SVR, even if the associated data is not used to train model. To achieve this, four items including the mass loading, thickness, porosity, and C-rate of cyclic current are utilized as the inputs of SVR, then the model will be trained based on the manufactured cell capacity data at all C-rates except a specifical one which would be used for validation purpose.

Figure 3.22 illustrates the predicted capacity values of three different cases and their relations to input items. For case (a), manufacturing data under C-rates of C/20, C/2, 1C, and 2C are utilized for training while the data with C/5 current is used for validation. The RMSE for such a case is 0.128 mAh while the corresponding $R^{2}$ is 0.98. For case (b), manufacturing data under C-rates of C/20, C/5, 1C, and 2C are used to train model while C/2 is used for validation. Here the RMSE becomes 0.128 mAh, while $R^{2}$ is 0.97. For case (c), manufacturing data under C-rates of C/20, C/2, C/5 and 2C are utilized for training while the data with 1C current is used for validation. Here the RMSE is 0.150 mAh while $R^{2}$ becomes 0.97. Based upon these results, the prediction accuracy of manufactured cell capacities under low C-rate is desirable, even if they are not included in the training data to build the SVR model. The prediction results for 1C capacity are slightly worse, which is mainly caused by its dependency on 2C capacity data presents higher variability in comparison to those with C/20, C/2, and C/5 cases.

In addition, the effects of each input item on the capacity prediction performance are also analysed. It can be seen that C-rate cannot be seen as a sole input to get satisfactory results of cell capacity prediction. Besides, the mass loading, thickness, and C-rate (M, T, C) are the input items that present the most significant effects on manufactured cell capacity prediction accuracy.

It should be known that the ability of SVR model with an extra current input to well predict manufactured cell capacity at other cyclic conditions is very useful for battery manufacturers as it could eliminate the requirements to run experiments for obtaining the corresponding data. This is more important for low C-rate cases due to the huge time-consumption of the related experiment.

3.4.3 Case 2: Battery Cell Capacity Classification with RUBoost

In the applications of battery cell manufacturing, material component formulations play a pivotal role in determining manufactured battery performance such as cell capacity. In this case study, an effective data science framework based on an advanced ensemble learning technology named RUBoost is designed to efficiently classify manufactured battery capacity considering the class imbalance issue, while the sensitivity of material component parameters of interest is also analysed [44]. The structure of this RUBoost model is shown in Fig. 3.23. To be specific, five parameters of material components including the active material weight-fraction (AMw), C65 weight-fraction (C65w), CNF weight-fraction (CNFw), Binder weight-fraction (Binderw), and Binder type are inputs of RUBoost model, while related manufactured cell capacity under C/20 is adopted as model output.

For the multiple classification applications, class imbalance is easy to occur. Two techniques including data sampling and boosting are generally adopted to alleviate the effects caused by class imbalance. For data sampling, observations from training data can be balanced by decreasing samples (undersampling) or adding samples (oversampling). In theory, as the size of training data is reduced, undersampling can benefit the computational effort but could also lead to the loss of information. On the contrary, oversampling is free of information loss. But it can bring overfitting and increase the computational burden.

On the other hand, boosting technique could be also used to overcome the issue of class imbalance. One typical boosting solution is adaptive boosting (AdaBoost) [45]. In this study, to further improve the battery cell capacity classification performance considering the class imbalance issue, a hybrid solution named random undersampling boosting (RUBoost) is derived. Supposing a training dataset $\left\{ {\left( {X_{1} , C_{1} } \right),\left( {X_{2} , C_{2} } \right), \ldots ,\left( {X_{N} , C_{N} } \right)} \right\}$ has $N$ observations, $X_{i}$ refers to a vector consisting of $P$ interested items, $C_{i}$ is class output with $K$ labels, $L\left( X \right)$ means a weak learner to output a class based on $X$, then the detailed steps to establish the related RUBoost-based ensemble learning model is shown in Table 3.9.

Table 3.9 Detailed process to establish RUBoost data science-based model for multi-classification

Full size table

Obviously, one big difference between RUBoost and AdaBoost is that a random undersampling is utilized for RUBoost in each iteration to decrease the majority class’s observations to the designed percentage $\left( {P\% } \right)$, further leading to a temporary dataset ${\text{TD}}_{j}^{\prime }$ with a new weight distribution $W_{j}^{\prime }$. Then the weak learner $L^{\prime \left( j \right)} \left( X \right)$ will be well-trained by using ${\text{TD}}_{j}^{\prime }$ and $W_{j}^{\prime }$. According to this undersampling way, RUBoost is able to not only provide the balanced observation for training but also results in a decreased computational effort.

Detailed procedure of using this RUBoost-based data science framework to classify cell capacity and to perform sensitivity analysis of manufacturing parameters is summarized in Fig. 3.24, which contains the following four main steps:

Step 1: Data curation and preprocess: Following a defined setting rule shown in Table 3.10, capacities of both various Li-ion battery types (LFP and LTO) are classified with three labels as low, medium, high.
Table 3.10 Class labels of manufactured battery electrode capacity
Full size table
Step 2: RUBoost model construction: in this step, RUBoost’s hyperparameters need to be set firstly. In theory, three main hyperparameters are required for the RUBoost-based data science model: the decision trees’ number (M), learning rate (r), and desired percentage that is represented by minority class $\left( {P\% } \right)$. A larger M could increase model’s classification accuracy but could also result in the increased computational effort and overfitting issue. In this study, an iteration strategy based on the evaluation of classification error is performed to select proper M. Second, r means the decay speed of each learner’s weight, while the class observations are balanced by $P\%$ during the training stage. As recommended by [46], setting $r = 0.1$ and $P\%$ as the minority class’s percentage is a good solution.
Step 3: Sensitivity analysis: After well training the RUBoost-based model, the sensitivity analysis of variable importance and correlations could be carried out based on the Gini importance and PMOA. The detailed procedure to calculate the Gini importance refers to [31] for the readers of interest, while PMOA could be calculated based on the Eq. (3.11). Then a heat map consisting of PMOAs can be obtained to reflect each pair’s correlation.
Step 4: Imbalanced classification of electrode properties: After quantifying importance and correlations of battery components of interest, the quality of battery cell capacity would be also predicted by the derived RUBoost model. The confusion matrix (CM) is adopted here as the main performance indicator. Other performance indicators from Sect. 1.3.3 in Chap. 1 can be used to evaluate the electrode property classification results.

Then following all these four steps, two tests through establishing proper RUBoost-based models are carried out to classify battery cell capacity and to perform the sensitivity analysis of four component parameters of interest. For all these two tests, $r$ and $P\%$ are set as 0.1 and the percentage of the minority class, respectively. An iteration way through the evaluation of classification error is carried out to determine $M$. Fivefold cross-validation is conducted, resulting in the training sample and test sample are 110 and 28 for LFP case, 86 and 22 for LTO case, respectively.

3.4.3.1 LFP Case Studies

For the capacity classification of LFP-based battery cell, its classification error versus various $M$ and related Gini importance are shown in Fig. 3.25. It can be seen that its classification error would decrease to 0.02 after adopting just three decision trees. This implies that an approximately linear relationship between these formulation variables and LFP-based cell capacity exists. The LFPw provides the largest Gini importance weight, while C65w presents a bit larger importance value than that of CNFw. Additionally, according to the heat map in Fig. 3.26, LFPw provides a relatively higher correlation with C65w, CNFw, and Binderw.

To evaluate the classification performance, the CMs of the cases of using all four manufacturing component parameters and three most important parameters of LFP-based battery cell capacity are generated, as illustrated in Fig. 3.27. Here the $microF1$ of both all and reduced parameter cases reach 97.8% and 95.7%, respectively, indicating that a fantastic classification performance could be obtained through using the derived RUBoost model to classify LFP-based battery cell capacity.

3.4.3.2 LTO Case Studies

Next, the test through using the designed RUBoost-based data science framework to evaluate the influences of the same component parameters on cell capacity is also carried out for LTO-based battery.

For LTO-based battery cell capacity, its classification error would converge to 0.06 after using 92 decision trees. The Binder type provides the lowest value of Gini importance while LTOw, Binderw, and C65w result in the three most important parameters with weights over 0.08, as illustrated in Fig. 3.28. Based upon its heat map in Fig. 3.29, the pair of LTOw and CNFw gives the highest PMOA but is still less than 0.5, which implies that the correlations are small for the classification of LTO-based battery.

Interestingly, according to the CMs of both all and reduced material component parameter cases in Fig. 3.30, the classification results of these two cases are similar with a $microF1$ of 94.4%. These facts signify that a satisfactory cell capacity classification could be obtained for LTO-based battery. On the other hand, the component parameters of LTOw, Binderw, and C65w are enough to accurately classify the LTO-based battery cell capacity through using the derived RUBoost-based data science framework.

In summary, the class imbalance problem is easy to happen during battery manufacturing process and would further affect the classification performance of manufactured cell capacity. The proposed RUBoost-based data science framework presents the superiorities in terms of accuracy, interpretability for both feature importance and correlations, data-driven nature, and the ability to handle the class imbalance issue. When other battery manufacturing data such as the mixing speed, kneading intensity, temperature, and pressure are available, it has a good potential in the reliable multi-classification and sensitivity analyses for these process parameters to benefit smarter battery manufacturing.

3.5 Summary

This chapter describes the data science-based battery manufacturing management, the initial and key stage during battery full-lifespan. An overview of battery manufacturing is first introduced by dividing it into battery electrode manufacturing and battery cell manufacturing. Then a framework for using data science tools to manage battery manufacturing is described, followed by the comparisons of various popular machine learning technologies used in battery manufacturing management in terms of their merits and drawbacks. For battery electrode manufacturing, two data science-based case studies through deriving the GPR regression model and RF classification model to predict manufactured battery electrode properties (mass loading and porosity) and perform sensitivity analyses of strong-coupled manufacturing parameters are given. For battery cell manufacturing, another two data science-based case studies by designing SVR and RUBoost-based models to predict/classify manufactured cell capacities and analyse related manufacturing parameters are described. Illustrative results indicate that designing suitable data science tools could accurately capture battery product properties prior to its manufacturing. Furthermore, it could automatically explain the interactions and effects of strong-coupled manufacturing parameters. This could help to significantly reduce the monitoring burden of the battery manufacturing line. Cheaper and more efficient means of producing high-performance batteries can be also identified to benefit battery manufacturers.

References

Kwade A, Haselrieder W, Leithoff R, Modlinger A, Dietrich F, Droeder K (2018) Current status and challenges for automotive battery production technologies. Nat Energy 3(4):290–300
Article Google Scholar
Kendrick E (2019) Advancements in manufacturing. In: Future lithium-ion batteries, pp 262–289
Google Scholar
Turetskyy A, Thiede S, Thomitzek M, Von Drachenfels N, Pape T, Herrmann C (2020) Toward data-driven applications in lithium-ion battery cell manufacturing. Energy Technol 8(2):1900136
Article Google Scholar
Ng M-F, Zhao J, Yan Q, Conduit GJ, Seh ZW (2020) Predicting the state of charge and health of batteries using data-driven machine learning. Nat Mach Intell 2(3):161–170
Article Google Scholar
Aykol M, Herring P, Anapolsky A (2020) Machine learning for continuous innovation in battery technologies. Nat Rev Mater 5(10):725–727
Article Google Scholar
Niri MF, Liu K, Apachitei G, Román-Ramírez LA, Lain M, Widanage D, Marco J (2022) Quantifying key factors for optimised manufacturing of Li-ion battery anode and cathode via artificial intelligence. Energy AI 7:100129
Google Scholar
Wanner J, Weeber M, Birke KP, Sauer A (2019) Quality modelling in battery cell manufacturing using soft sensoring and sensor fusion—a review. In: Proceedings of 9th international electric drives production conference (EDPC), SV Veranstaltungen, Germany, Esslingen, 2019, pp 1–9
Google Scholar
Schnell J, Reinhart G (2016) Quality management for battery production: a quality gate concept. Procedia CIRP 57:568–573
Article Google Scholar
Schnell J, Nentwich C, Endres F, Kollenda A, Distel F, Knoche T, Reinhart G (2019) Data mining in lithium-ion battery cell production. J Power Sources 413:360–366
Article CAS Google Scholar
Thiede S, Turetskyy A, Kwade A, Kara S, Herrmann C (2019) Data mining in battery production chains towards multi-criterial quality prediction. CIRP Ann 68(1):463–466
Article Google Scholar
Hoffmann L, Grathwol J-K, Haselrieder W, Leithoff R, Jansen T, Dilger K, Dröder K, Kwade A, Kurrat M (2020) Capacity distribution of large lithium-ion battery pouch cells in context with pilot production processes. Energy Technol 8(2):1900196
Article CAS Google Scholar
Cunha RP, Lombardo T, Primo EN, Franco AA (2020) Artificial intelligence investigation of NMC cathode manufacturing parameters interdependencies. Batteries Supercaps 3(1):60–67
Article Google Scholar
Riexinger G, Doppler JP, Haar C, Trierweiler M, Buss A, Schöbel K, Ensling D, Bauernhansl T (2020) Integration of traceability systems in battery production. Procedia CIRP 93:125–130
Article Google Scholar
Wessel J, Turetskyy A, Wojahn O, Herrmann C, Thiede S (2020) Tracking and tracing for data mining application in the lithium-ion battery production. Procedia CIRP 93:162–167
Article Google Scholar
Knoche T, Surek F, Reinhart G (2016) A process model for the electrolyte filling of lithium-ion batteries. Procedia CIRP 41:405–410
Article Google Scholar
Schönemann M, Bockholt H, Thiede S, Kwade A, Herrmann C (2019) Multiscale simulation approach for production systems. Int J Adv Manuf Technol 102(5):1373–1390
Article Google Scholar
Kornas T, Knak E, Daub R, Bührer U, Lienemann C, Heimes H, Kampker A, Thiede S, Herrmann C (2019) A multivariate KPI-based method for quality assurance in lithium-ion-battery production. Procedia CIRP 81:75–80
Article Google Scholar
Niri MF, Liu K, Apachitei G, Roman-Ramirez L, Lain M, Widanalage D, Marco J (2021) Machine-learning for Li-ion battery capacity prediction in manufacturing process. In: Proceedings of ECS meeting abstracts, p 427
Google Scholar
Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17(5–6):375–381
Article Google Scholar
Liu K, Yang Z, Wang H, Li K (2021) Classifications of lithium-ion battery electrode property based on support vector machine with various kernels. In: Recent advances in sustainable energy and intelligent systems. Springer, Singapore, pp 23–34
Google Scholar
Emilsson E, Dahllöf L (2019) Lithium-ion vehicle battery production. IVL Swedish Environmental Research Institute, Stockholm, Sweden
Google Scholar
Liu K, Peng Q, Li K, Chen T (2022) Data-based interpretable modeling for property forecasting and sensitivity analysis of Li-ion battery electrode. Autom Innov 1–13
Google Scholar
Lenze G, Bockholt H, Schilcher C, Froböse L, Jansen D, Krewer U, Kwade A (2018) Impacts of variations in manufacturing parameters on performance of lithium-ion-batteries. J Electrochem Soc 165(2):A314
Article CAS Google Scholar
Mohanty D, Hockaday E, Li J, Hensley D, Daniel C, Wood III D (2016) Effect of electrode manufacturing defects on electrochemical performance of lithium-ion batteries: cognizance of the battery failure sources. J Power Sources 312:70–79
Google Scholar
Baunach M, Jaiser S, Schmelzle S, Nirschl H, Scharfer P, Schabel W (2016) Delamination behavior of lithium-ion battery anodes: influence of drying temperature during electrode processing. Drying Technol 34(4):462–473
Article CAS Google Scholar
Liu K, Wei Z, Yang Z, Li K (2021) Mass load prediction for lithium-ion battery electrode clean production: a machine learning approach. J Clean Prod 289:125159
Google Scholar
Rasmussen CE, Nickisch H (2010) Gaussian processes for machine learning (GPML) toolbox. J Mach Learn Res 11:3011–3015
Google Scholar
Liu D, Pang J, Zhou J, Peng Y, Pecht M (2013) Prognostics for state of health estimation of lithium-ion batteries based on combination Gaussian process functional regression. Microelectron Reliab 53(6):832–839
Article CAS Google Scholar
Liu K, Hu X, Zhou H, Tong L, Widanalage D, Marco J (2021) Feature analyses and modelling of lithium-ion batteries manufacturing based on random forest classification. IEEE/ASME Trans Mechatron 26(6):2944–2955
Article Google Scholar
Cutler A, Cutler DR, Stevens JR (2012) Random forests. In: Ensemble machine learning. Springer, Boston, MA, pp 157–175
Google Scholar
Liu H, Cocea M (2018) Induction of classification rules by Gini-index based rule generation. Inf Sci 436:227–246
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2)
Google Scholar
Pfleging W (2018) A review of laser electrode processing for development and manufacturing of lithium-ion batteries. Nanophotonics 7(3):549–573
Article CAS Google Scholar
Leithoff R, Fröhlich A, Dröder K (2020) Investigation of the influence of deposition accuracy of electrodes on the electrochemical properties of lithium-ion batteries. Energy Technol 8(2):1900129
Article CAS Google Scholar
Schilling A, Wiemers-Meyer S, Winkler V, Nowak S, Hoppe B, Heimes HH, Dröder K, Winter M (2020) Influence of separator material on infiltration rate and wetting behavior of lithium-ion batteries. Energy Technol 8(2):1900078
Article CAS Google Scholar
Heins TP, Leithoff R, Schlüter N, Schröder U, Dröder K (2020) Impedance spectroscopic investigation of the impact of erroneous cell assembly on the aging of lithium-ion batteries. Energy Technol 8(2):1900288
Article CAS Google Scholar
Francis CF, Kyratzis IL, Best AS (2020) Lithium-Ion battery separators for ionic–liquid electrolytes: a review. Adv Mater 32(18):1904205
Article CAS Google Scholar
Weber CJ, Geiger S, Falusi S, Roth M (2014) Material review of Li ion battery separators. In: Proceedings of American Institute of Physics Conference (AIP), TU Bergakademie, Germany, Freiberg, 2014, pp 66–81
Google Scholar
Wood III DL, Li J, An SJ (2019) Formation challenges of lithium-ion battery manufacturing. Joule 3(12):2884–2888
Google Scholar
Zhou Y, Su M, Yu X, Zhang Y, Wang J-G, Ren X, Cao R, Xu W, Baer DR, Du Y (2020) Real-time mass spectrometric characterization of the solid–electrolyte interphase of a lithium-ion battery. Nat Nanotechnol 15(3):224–230
Article CAS Google Scholar
Niri MF, Liu K, Apachitei G, Ramirez LR, Lain M, Widanage D, Marco J (2021) Machine learning for optimised and clean Li-ion battery manufacturing: revealing the dependency between electrode and cell characteristics. J Clean Prod 324:129272
Google Scholar
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567
Article CAS Google Scholar
Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113(13):130503
Google Scholar
Liu K, Hu X, Meng J, Guerrero JM, Teodorescu R (2021) RUBoost-based ensemble machine learning for electrode quality classification in Li-ion battery manufacturing. IEEE/ASME Trans Mechatron. https://doi.org/10.1109/TMECH.2021.3115997 (in press)
Ying C, Qi-Guang M, Jia-Chen L, Lin G (2013) Advance and prospects of AdaBoost algorithm. Acta Automat Sin 39(6):745–758
Article Google Scholar
Mounce S, Ellis K, Edwards J, Speight V, Jakomis N, Boxall J (2017) Ensemble decision tree models using RUSBoost for estimating risk of iron failure in drinking water distribution systems. Water Resour Manag 31(5):1575–1589
Article Google Scholar

Download references

Author information

Authors and Affiliations

Warwick Manufacturing Group (WMG), University of Warwick, Coventry, UK
Kailong Liu
Department of Automation, University of Science and Technology of China, Hefei, China
Yujie Wang
School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai, China
Xin Lai

Authors

Kailong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kailong Liu .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, K., Wang, Y., Lai, X. (2022). Data Science-Based Battery Manufacturing Management. In: Data Science-Based Full-Lifespan Management of Lithium-Ion Battery. Green Energy and Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-01340-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-01340-9_3
Published: 09 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01339-3
Online ISBN: 978-3-031-01340-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Data Science-Based Battery Manufacturing Management

Abstract

3.1 Overview of Battery Manufacturing

3.2 Data Science Application of Battery Manufacturing Management

3.2.1 Data Science Framework for Battery Manufacturing Management

3.2.2 Machine Learning Tool

3.3 Battery Electrode Manufacturing

3.3.1 Overview of Battery Electrode Manufacturing

3.3.2 Case 1: Battery Electrode Mass Loading Prediction with GPR

3.3.2.1 ARD Kernel for Feature Selection

3.3.2.2 Results and Discussions

3.3.3 Case 2: Battery Electrode Property Classification with RF

3.3.3.1 RF Technique and Feature Analyses Solutions

3.3.3.2 Electrode Mass Loading Analysis

3.3.3.3 Electrode Porosity Analysis

3.4 Battery Cell Manufacturing

3.4.1 Overview of Battery Cell Manufacturing

3.4.2 Case 1: Battery Cell Capacities Prediction with SVR

3.4.3 Case 2: Battery Cell Capacity Classification with RUBoost

3.4.3.1 LFP Case Studies

3.4.3.2 LTO Case Studies

3.5 Summary

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation