Impact of material data in assembly delay prediction—a machine learning-based case study in machinery industry

Designing customized products for customer needs is a key characteristic of machine and plant manufacturers. Their manufacturing process typically consists of a design phase followed by planning and executing a production process of components required in the subsequent assembly. Production delays can lead to a delayed start of the assembly. Predicting potentially delayed components—we call those components assembly start delayers—in early phases of the manufacturing process can support an on-time assembly. In recent research, prediction models typically include information about the orders, workstations, and the status of the manufacturing system, but information about the design of the component is not used. Since the components of machine and plant manufacturers are designed specifically for the customer needs, we assumed that material data influence the quality of a model predicting assembly start delayers. To analyze our hypothesis, we followed the established CRISP-DM method to set up 12 prediction models at an exemplary chosen machine and plant manufacturer utilizing a binary classification approach. These 12 models differentiated in the utilization of material data—including or excluding material data—and in the utilized machine learning algorithm—six algorithms per data case. Evaluating the different models revealed a positive impact of the material data on the model quality. With the achieved results, our study validates the benefit of using material data in models predicting assembly start delayers. Thus, we identified that considering data sources, which are commonly not used in prediction models, such as material data, increases the model quality.


Introduction
Manufacturing companies are challenged to succeed in dynamic international markets requesting high-quality products, flexibility, on-time delivery, and a reasonable cost structure [1][2][3]. Here, short delivery times and adherence to delivery dates is a key factor to differentiate from competitors. A typical example of this is the machine and plant manufacturing industry producing complex products consisting of numerous components [4,5]. Many of these components are customized enabling tailor-made solutions for the customers' requirements. In general, the manufacturing process of machine and plant manufacturers starts with the design of the product and components, followed by the production planning, the purchasing of raw materials, and the production process to manufacture the individual components needed in the subsequent assembly process. In parallel to the production process, the components required in the assembly are also purchased from suppliers. The task of the assembly is to assemble a product of higher complexity with predefined functions with a certain quantity of components in a partly multi-stage process in a given time [6] Furthermore, in the assembly many material flows converge, leading to a high potential of delays [7]. Thus, an essential factor for meeting the delivery date is the start of the assembly on time and a prior timely supply of the components needed for assembly. Subsequently, components produced in the processes upstream of the assembly have a direct influence on the performance of the assembly process. Assuming that all components are required to start the assembly process, even a single component supplied behind schedule will lead to a delayed start of assembly [8]. To meet delivery dates, it would be helpful to predict these delayed individual components (we call them assembly start delayers) in the early stages of the manufacturing process. Based on an early prediction, measures such as close communication with the supplier, extra shifts to temporarily increase production capacity, or utilizing a different workstation can be derived to speed up the manufacturing process and thus, prevent assembly start delays.
With the increasing development of machine learning (ML) and the availability of big data, ML-based prediction models are becoming more and more established in the field of production planning and control. ML models have already been successfully applied to predict lead times of manufacturing processes [9] and to predict assembly start delayers [10]. Our previous research already showed that predicting assembly start delayers utilizing a binary classification is the recommended approach and outperforming approaches utilizing a lead time prediction to identify assembly start delayers [10]. Furthermore, when setting up and training a prediction model, the used data model has a central influence on the model quality of the prediction model [11,12]. For example, Burggraef et al. [9] have already discovered that material data defining all characteristics of the product to be manufactured such as geometric specification, weights, or the material itself are rarely used in ML-models to predict lead times.
Looking at the business process of a machine and plant manufacturer in contrast to the usage of material data in prediction models, it is noticeable, that the products of machine and plant manufacturers are typically tailor-made for each customer need [13,14]. As the product's characteristics strongly influence the needed processes for its manufacturing [15], the design phase of machine and plant manufacturers including the material data specified within the design phase also has a non-negligible influence on the manufacturing process. Consequently, we assume that the usage of material data in a model predicting assembly start delayers has an impact on its model quality. Nevertheless, material data are currently only rarely used in prediction models. But, so far, a validation that the material data influence the respective model quality has not yet been performed.
Thus, our manuscript aims to set up an ML-based model for the prediction of assembly start delayers and to analyze and systematize the influence of material master data on the model quality. As a research method, we apply a case study at a machine and plant manufacturer. With the achieved results, our paper provides two main contributions: • We developed a model to predict assembly start delayers utilizing a machine learning classification approach. • We identified that material data influence the model quality of a model predicting assembly start delayers. However, there was only a slight influence.
Our paper is structured as follows. Section 2 first introduces the product structure and manufacturing processes in an engineer-to-order environment as well as available approaches to identify and predict assembly start delayers. Section 3 elaborates on our approach to quantify the impact of material data on the model quality predicting assembly start delayers utilizing ML. In Sect. 4, the results are presented and discussed. Section 5 critically reviews the limitations of our approach and the results obtained. Furthermore, the implications for further research are derived. Finally, a summary is given in the last section.

State of the art
The products of machine and plant manufacturers typically consist of several hundred to several thousand components. These are procured from suppliers or manufactured in the company's production facilities. Purchased components can be procured on an order-anonymous basis, such as for standard components, and an order-specific basis, such as for special and drawing components. The procurement of components from suppliers as well as the manufacturing of components in the in-house production belong to processes upstream of the assembly [16]. Since the assembly is a convergence point where several material flows converge, the risk of delays due to missing components is increased [17]. One established model to analyze converging material flows is the assembly flow element developed by Schmidt [18] with further developments and applications in the assembly flow diagram and supply diagram [16,18]. In all models, the socalled completer is the last inflow to an assembly order and is therefore the component that was supplied last by the processes upstream of the assembly. A completer can be completed on time-before the planned start date of the assembly, or lateafter the planned start of the assembly. A late finalization of a completer, therefore, leads to a delay in the start of assembly. In this manuscript, we define such components as "assembly start delayers" (see also Chapter 1). Assuming that all components are necessary to start the assembly, the schedule variance of the assembly start delayer determines the earliest possible start date of the assembly. Accordingly, a temporal acceleration of the manufacturing and/or procurement process of an assembly start delayer has the biggest potential to push a delayed assembly start back to the target date. However, the supply diagram is primarily designed to analyze data relating to the past and to identify general issues such as an overall bad assembly supply situation in individual assembly areas. To derive casespecific countermeasures to accelerate individual production orders, further analysis is needed.
In production, typically scheduling techniques are used to derive order sequences and to calculate lead times of work orders used to determine the start dates and end dates of the respective orders and subsequently to determine the assembly start delayers [19]. The order sequence is defined according to certain rules considering for example the available production capacities, the technical requirements, the demand dates, and the system status [8,20,21]. Further, especially for remanufacturing systems, also environmental objectives are considered [22,23]. To optimize the lead time of an order, the determination of its waiting time depending on the machine's utilization is essential [24]. Here, performance curves considering functional relationships between logistic parameters such as lead times, throughput, and stock play a key role [24,25]. Nevertheless, deviations from the schedule may occur leading to an inaccurate determination of the assembly start delayers. Besides determining the assembly start delayers based on calculated lead times utilizing scheduling techniques, it is also possible to predict lead time directly. By predicting the lead times, completion dates can be determined early and deviations from the schedule can be detected [26]. In the past, many approaches for the prediction of lead times have been established. For example, Cheng and Gupta [27] investigated methods from the field of operations research (OR) such as Constant (CON), Random (RAN), or Total-Work (TWK). With the increasing development of ML, new methods for predicting lead times have emerged (see, for example, [28][29][30][31]).
A systematic literature review conducted by Burggraef et al. [9] has analyzed existing approaches focusing on the prediction of lead times in the research fields of ML and OR and classified them according to the three criteria data class, data origin, and used method/algorithm. Looking at the data class, the authors identified that the majority of publications examined use order data and information about the system status of the production system. In detail, 95 % of their 42 publications examined use order data, and 62 % use information about the system status. Jia, Zhang et al. [32], Berlec and Govekar [33] or Gramdi [34] for example use order data such as start and end dates of orders or order-specific processing times for the prediction of lead times, whereas the authors in [28] and [35], for instance, use a combination of order data and information about the system status such as the machine utilization, processing times or the queue length. In contrast to the order data and information about the system, with 24 % of the 42 publications examined, machine data are slightly less used. For example, the authors in [36] include the machine ID and the authors in [37] include the so-called 'equipment data' containing information about machines and tools in their prediction models. Further, Burggraef et al. [9] identified Gyulai, Pfeiffer et al. [38] and Karagolan and Karademir [39] with a portion of only 5 % of the 42 publications examined as the only authors who include material data such as dimensions or specifications of the product in their prediction models. These findings highlight that material data were rarely used compared to order data, information about the system status, and machine data.
That being said, the business process of a machine and plant manufacturer typically hinges on tailor-made products for each customer need [13,14]. Thus, the design phase in the business process and the associated documents herein can be said to have a non-negligible influence on the desired product. Furthermore, the product design is also the basis for the production planning determining the process to manufacture the respective components [15]. Accordingly, the material data specified in the design phase are also influencing the manufacturing process. Consequently, we assume that the usage of material data in a model predicting assembly start delayers has an impact on its model quality. Nevertheless, material data are currently only rarely used in prediction models.
Utilizing the findings of the systemic literature review in [9], the authors in [10] applied different ML algorithms on a total of 24 different prediction models on four different levels of detail to identify the modeling approach with the highest model quality in predicting assembly start delayers. Their models on the coarsest level of detail predicted assembly start delayers utilizing a binary classification. Their models on the three finer levels of detail predicted assembly start delayers via a prediction of different lead times (component lead times, order lead times, and operation lead times) utilizing a regression approach and subsequent postprocessing operations to identify the assembly start delayers. After training the 24 prediction models based on a real data set of a machine and plant manufacturer and evaluating their model quality, they identified the coarsest level of detail utilizing the binary classification as the best modeling approach. Thus, one of their findings was, that performing a binary classification to predict assembly start delayers outperformed the prediction of assembly start delayers based on a prior prediction of lead times utilizing a regression model. Accordingly, for our approach, applying a binary classification is recommended to predict assembly start delayers. Furthermore, the authors in [10] already used material data in all of their 24 prediction models leading to good results. Nevertheless, as they did not systematically analyze the impact of material on the model quality, there is still no analysis available proofing that material data have an impact on the quality of models predicting lead times.
In summary, there are models available for the prediction of lead times, but they are not explicitly used for the prediction of assembly start delayers. Currently, there is only one approach available focusing on the prediction of assembly start delayers in the field of machine and plant manufacturers comparing a direct prediction of assembly start delayers with an indirect prediction based on a previous lead-time prediction. But still, there is no analysis performed on the impact of material data on the quality of models predicting lead times.
Consequently, in this work, we will focus on investigating the influence of material data on the quality of models predicting assembly start delayers. This systemic analysis is completely novel compared to recent research. For this purpose, the following research question is posed, considering the previous explanations: "What effect does the use of material data have on the model quality of a model predicting assembly start delayers?" Following our argumentation that the products of machine and plant manufacturers are typically designed tailor-made to meet the specific customer needs and that the material data, therefore, characterize a product, we formulate the following working hypothesis: "The model quality for the prediction of assembly start delayers increases when utilizing material data."

Modelling approach
Examining an exemplary use case is an established approach in the field of machine learning, especially in lead-time prediction (see, for example, [37,[39][40][41] and assembly start delayer prediction (see, for example, [10]). One motivation for examining an exemplary use case is to gain insights for real needs, such as the need of a manufacturing company, rather than to develop theories without practical relevance [42]. Accordingly, investigating an exemplary use case to answer our research question and to study our working hypothesis is an appropriate and established approach and thus, was our approach of choice. Furthermore, as this work extends our previous research in the prediction of assembly start delayers [10], we investigated the same case at the previously chosen representative machine and plant manufacturer.
The methodology used in this manuscript is following the established Cross Industry Standard Process for Data Mining (CRISP-DM) [43,44] consisting of the six phases Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.

Business understanding
In the Business Understanding phase, we derived objectives and requirements from a business perspective and converted them into a data mining problem. The objective from a business perspective was to prevent delays due to missing components in the final assembly so that the predefined due date of a customer order can be met. Early detection of components that have a higher tendency of late finishing in their preprocessing stages would be helpful to prevent a subsequent delay in the final assembly, as the production planer can accelerate the order in the preprocessing stages. The company under observation develops machines for steel production, which are made up of several hundred components. These components are both procured from suppliers and are manufactured in-house. An analysis carried out in the company beforehand showed that approx. 95 % of the assembly start delayers are components produced in the company's production. Thus, the scope of our prediction model was constrained to the components manufactured in-house. In the process upstream of the assembly, these in-house components are processed by various machines for mechanical and welding operations. In the prediction model, the components were classified as "assembly start delayer" (ASD) or "no-assembly start delayer" (NASD) which was identified as a suitable modeling approach in our previous research [10]. For this classification, a slightly modified version of the definition of the assembly start delayers given in chapter 2 is applied: Instead of considering only one single assembly start delayer as a date determining factor for the assembly start according to the definition of Beck and Schmidt [16,18] and thus, assigning the highest potential for improvement to this component, several assembly start delayers were considered for each assembly order. This extension is recommended, since considering only one assembly start delayer is not revealing whether this single one is an outlier or whether a large portion of the components is completed at a similar time. The modified assembly start delayer classification was defined as follows: If the schedule variance of a component is larger than or equal to 80 % of the maximum schedule variance of all components of an assembly order, which is the schedule variance of the actual assembly start delayer, then this component is considered as an assembly start delayer. In detail, we utilized the formula to assign one of the two classes ASD or NASD to every component i, where SV i,j is the schedule variance of component i of assembly order j, calculated by where CD i , j is the completion date of component i of assembly order j and TSD j is the target start date of assembly orderj, and SV j , max the maximum schedule variance of all components of assembly orderj, calculated by where CD j,max is the latest completion date of all components of assembly order j (the completion date of the respective completer).
The time of application of the prediction models (prediction time) and thus, the time of gaining knowledge about potential assembly start delayers should be as early as possible within the production process, as the production planer can accelerate the order in the manufacturing processes upstream of assembly stages. For the prediction models within this study, we set the date of order creation and thus, the completion of order planning as prediction time. At this point, all necessary information, such as bill of materials, operations, and machine assignments, are available.
Summarized, we converted the business objective to a binary classification problem. Subsequently, to answer the research question, and with our hypothesis that the model quality for the prediction of assembly start delayers increases when utilizing material data, we derived our data mining approach: We compared ML-based binary classification models using a data set including material data with ML-based binary classification models using the same data set but excluding the material data (cf. Fig. 1). For both cases, "including material data" and "excluding material data," we applied several ML-algorithms such as tree-based classifiers, support vector machines, or neural networks utilizing the Scikit-learn library or Keras library in Python (further details about the ML-algorithms used are explained in chapter 3 D). In total, 12 models were created, six per case utilizing different ML algorithms. Thus, with our approach we compared the performance of the different ML algorithms in both cases to identify the impact of material data on the model quality and the best performing ML algorithm by evaluating the achieved model qualities. Such a systemic analysis of the impact of material data on the model quality is completely new in recent research (see Chapter 2).
To evaluate the different achieved model qualities, we applied a confusion matrix, since the output of all ML models is the binary classification "assembly start delayer / no assembly start delayer." The evaluation of the model quality with a confusion matrix is an established method and has already been demonstrated in other studies (see, for example, [45,46]). Based on the confusion matrix, we calculated Matthew's correlation coefficient (MCC) and the F-score as established evaluation metrics to compare the performance of the different ML algorithms on both data sets. As recommended by the authors in [10,47], the MCC considers the balanced ratios of all four confusion matrix categories and thus, is the most informative metric to evaluate a confusion matrix. Considering the MCC also ensured that our model was not just predicting the majority class in our data set, which is "no assembly start delayer." Furthermore, as recommended by the authors in [10] we considered the F-score as an evaluation metric since it is focusing on the prediction of positives (assembly start delayers) only, which is the most important category in our case of interest. For the F-score, we used the F 2 -score in detail considering the recall two times as important as precision. This weighting is based on the assumption that it seems more important to identify as many of the actual assembly start delayers as possible, in case of doubt even more than exist, and to define acceleration measures for them, than not to identify individual assembly start delayers at all. By evaluating each ML model with these metrics, the impact Fig. 1 Modeling architecture to quantify the impact of material data on the model quality of material data on the quality of a model predicting assembly start delayers can be determined. Furthermore, with the MCC and F 2 -score, we use the same metrics as in our previous research [10] and thus ensure comparability.

Data understanding
In the data understanding phase, according to the authors in [44], we collected and analyzed the data to identify data quality problems and to develop a solid understanding of the dataset. The data were collected from the Enterprise Resource Planning (ERP) and Advanced Planning and Scheduling System (APS) of the plant and machine manufacturer under observation with a period under review of one year. In detail, we collected data from the four data classes order data, machine data, material data, and system status, and thus follow the recommendation of the authors in [9]. The data export consisted of several separate CSV files containing assembly orders, the corresponding production orders and operation as well as information on the material and the systems status. To better join the different files, we set up an entity-relationship diagram (see Fig. 2) enabling us to identify the primary keys, which are the prerequisite for their connection.
The complete dataset consisted of 356 assembly orders comprising 1,506 components supplied by the in-house production and thus, is equal to our previous research [10]. These 1,506 in-house components are manufactured by a total of 3,187 production orders comprising 15,772 operations. With our modified definition of an assembly start delayer, we had a total of 24 % "assembly start delayers" and 76 % "nonassembly start delayers" of all in-house components.
Further, as recommended by the authors in [44] we focused on gaining a better understanding of the data and developing first ideas of relevant data fields for the prediction of assembly start delayers by performing an exploratory data analysis. In detail, we utilized several graphical techniques such as boxplots, scatter plots, or Pareto charts. For example, we analyzed the distribution of the total number of operations needed to manufacture ASDs and NASDs (cf. Fig. 3a) showing a slight deviation between both classes. Components manufactured in more operations have a slightly higher tendency of becoming an ASD. As another example, we plotted the distribution of the gross weight of ASDs and NASDs as an initial study of the impact of material data (cf. Fig. 3b). ASDs have a slightly higher mean and median gross weight than NASDs. Heavier components may need extra handling effort and transport time and therefore have a higher tendency of becoming an ASD.

Data preparation
With the gathered understanding of the data, we continued with preparing the final dataset for training the models by transforming and cleaning the initial raw data. In detail, we continued to identify the relevant data field for the prediction models by performing a correlation analysis as recommended by the authors in [48]. Subsequently, after further data preprocessing operations such as discretization, decomposition, normalization, and aggregation (see, for details, [49,50]), we defined the features for our data model resulting in 17 features, although not all features are applied in all models (see Table 1).
Since tree-based classifiers from the Scikit-learn library and neural networks from Keras library can only be trained on numerical variables in Python [51], the categorical variables such as "component name", "dispatcher" and "priority" were converted to Boolean values by performing One-Hot-Encoding. The number of features increases to a total of 375 features. Due to the One-Hot-Encoding, our data set was transformed into a sparse matrix containing equal information but in a higher dimensional room. This sparse matrix could for example hinder the optimization of a neural network, due to a not neglectable number of zeros as input of the model. Furthermore, the encoded features could have a dependency on each other. To investigate the correlations between the features, we created a 375 × 375 correlation matrix in form of a lower triangular leading to 71,631 individual correlation coefficients which were assigned to five bins of different correlation strengths (cf. Table 2) according to the established rules recommended by the authors in [52,53]. Initially, 1.4 % of all feature-pairs showed at least a moderate correlation a correlation coefficient higher than 0.5 and 1.5% of features pairs have low correlation (Tables 1-5). This indicates an existing dependency between our features. Thus, a Principal Component Analysis (PCA) was performed to avoid a sparse matrix and to reduce the dependencies between the features to ensure a good model quality.
The improvement of the model quality by using a PCA has already been demonstrated in other studies (see, for example, [54]). By performing PCA, the 379 features were transformed into 46 principal components, which explain most of the variance of the original features. After performing PCA, we again performed a correlation analysis and assigned all correlation coefficients to the equal five bins (cf. For training and evaluating the models, the dataset was divided into training and test sets with a ratio of 80 % training data to 20 % test data. In selecting the ratio, we followed established ratios. These are approx. 75 % -80 % training data to 25 % -20 % test data [55].

Modeling
The subsequent modeling phase covered the development of ML models and the calibration of the hyperparameters to optimal values [44]. All ML models predict assembly start delayers using a binary classification, which was identified as the best modeling approach in our previous research [10]. Thus, components are classified as "assembly start delayer" or as "no assembly start delayer." To ensure the comparability of all ML models, we chose the same set of ML algorithms on both data sets. In detail, we compared the performance of a Support Vector classifier (SVC), a Decision Tree (DT) classifier, a Random Forest (RF) classifier, an Adaptive Boosting (AdaBoost) classifier utilizing a DT-classifier as a base estimator, a Gradient Boosting (GB) classifier and a Multilayer Perceptron (MLP), since they are established approaches for binary classifications [56][57][58]. For the MLP, specifically, a double hidden layer feedforward net with stochastic gradient descent (SGD) optimizer was applied. The number of nodes was 46 nodes on the input layer to cover all input features after performing One-Hot-Encoding and PCA, 50 nodes on each hidden layer, and one node on the output layer for the binary classification. The number of hidden layers, the number of nodes on the hidden layers, and the activation function on the hidden layers were defined by continuous optimization of the model quality. In detail, we compared different network architectures ranging from one to ten hidden layers with 1 to 100 nodes per hidden layer. The best network structure was the above-mentioned double hidden layer net. As activation function for the output layer, a sigmoid function was chosen, which is particularly suitable for binary classifications [59]. For the hidden layers, we applied a ReLU function as activation function after comparing it with the sigmoid function, tanh function and He function regarding the reached model qualities. All classification models were implemented in Python 3.7 utilizing the Scikit-learn library and Keras library. An overview of the optimized hyperparameters used in each of the classification models is given in the appendix in Tables 4 and 5.
In summary, we created 12 different prediction models to classify components as ASD or NASD. These 12 models differentiated in the utilization of material data-including or excluding material data -and in the utilized ML algorithmsix algorithms per material data case. The target was to quantify the effect of utilizing material data on the quality of a model predicting assembly start delayers while comparing different ML algorithms, which is a novel approach compared to recent literature. As metrics to evaluate the model quality, we used the MCC and F-Score based on a confusion matrix.

Evaluation of model application
In the evaluation phase, the applied models were thoroughly evaluated to check whether they meet the targets of our data mining approach [44]: Quantifying the impact of material data on the quality of a model predicting assembly start delayers. Thus, we split the two data sets-including and excluding material data-into two separate train and test data sets. Subsequently, we trained and tuned all ML algorithms based on the train data sets and then evaluated the achieved model qualities based on the two test data sets. The results are documented in Table 3.
Upon evaluating the metrics, it is particularly noticeable that the models trained on the data set including material data achieved the best results. Furthermore, the best results per data set were both achieved by the GB classifier. With an MCC of 0.67 and an F 2 -score of 77 %, the GB classifier utilizing material data outperformed the GB classifier not utilizing material data with an MCC of 0.62 and an F 2 -score of 71 %. Thus, comparing the best ML model per data set already indicates a dependence of the model quality on the material data.
Additionally, we created boxplots showing the spread in the F 2 -score and MCC of all ML models utilizing the two Fig. 3 Excerpt of the exploratory data analysis: Impact of gross weight and number of operations on assembly start delayers different data sets (cf. Fig. 4). With the boxplots, the overall dependency of the model quality on the material data independent of the considered ML algorithm was visualized. The distribution of the F 2 -score and MCCs of the ML models trained on the dataset including material data differed from the respective distribution of the ML models trained on the dataset excluding material data. This indicated that, overall, the ML models trained on the dataset including material data performed better than those excluding material data. Thus, the comparison of the overall spread of the ML models emphasizes the indication that material data have an impact on the quality of models predicting assembly start delayers.
Finally, we performed a statistical test to validate our working hypothesis. In detail, we performed two paired-samples t-tests, also referred to as dependent sample t-tests, both for MCC and F 2 -score. This paired-samples t-test is used to assess whether the population means of two related samples differ. Thus, with the two paired-samples t-tests, we compared the means of the two samples 'ML models including material data' and 'ML-models excluding material data' individual for MCC and F 2 -score. Additionally, we considered that the applied ML algorithms in each of the two samples were equal. Applying both tests revealed a p-value for MCC of approx. 0.003 and for the F 2 -score of approx. 0.005. Consequently, since both p-values were less than 0.05, the difference between the two samples in both the MCC and F 2 -score was statically significant. Accordingly, the impact of our considered material data on the model quality was statistically significant as well.
Consequently, the working hypothesis could be confirmed. The model quality significantly increased when material data were considered. However, in our case, there was only a slight increase in the MCC with an average of 0.04 and the F-score with an average of 3 %. Thus, we further analyzed possible explanations for this small impact only and hypothesized prospects to further increase the benefit of utilizing material data. The reason for the small impact of material data observed could be that the considered material data-gross weight and component name-contain too little information to describe the characteristics of the components. Other information of the component such as dimensions, volume or number, and specification of features in the component's CAD model like drill holes, shaft shoulders, radii, or surface roughness could further increase the impact of material data. For example, the transportation, stocking, and handling effort of a component do not solely depend on its weight, but also other characteristics like dimensions and volume. For instance, the dimensions of a component determine whether the component can be easily transported by a forklift or crane, and thus, indicates an impact on an increase in transport times. Furthermore, the number and specification of a component's features indicate its complexity and need for special processing operation influencing the processing time. Thus, considering additional material data could increase the model quality.
In summary, we could answer our research question with our main contribution that the model quality of an ML-based model predicting assembly start delayers is significantly increasing when using material data. Thus, our study proved that models predicting assembly start delayers benefit from utilizing material data. In our exemplary case, we included the material data gross weight and component name in our prediction model significantly increasing the model quality. With these results, our approach is the first to systematically analyze the influence  of material data on model quality in predicting assembly start delayers.

Limitations and implications for further research
In this work, we only considered one single machine and plant manufacturer as an exemplary case. Although a case-based approach is common in the field of machine learning (see, for example, [10,37,[39][40][41]), the findings might remain case-specific and might not be generalizable. Accordingly, future research should validate the achieved findings considering additional machine and plant manufacturers in further case studies. Nevertheless, in our work, we were able to show that material data have a positive influence on model quality for predicting assembly start delayers. However, the verifiable influence of the material data on the model quality was only small. We suspected the small range of data fields from the material data as a possible reason for this. Further material data could improve the model quality and thus strengthen the influence of the material data. Accordingly, future research should set up a model to predict assembly start delayers with additional material data.
The addition of further material data could also improve the generally low model quality. With a maximum MCC of 0.67 and a maximum F 2 -score of 77 %, the model quality is still too low for a successful practical application of the model, as there are still many false positive and false negative predictions. In general, the model quality depends on the input data, the utilized ML algorithm, and the complexity of the modeling approach [11,[60][61][62]. Together with our previous study [10], we already analyzed several different ML algorithms  Fig. 4 Boxplot of MCC and F-Score for all prediction models on each of the four levels of details and modeling approaches. Thus, we infer that neither further optimization of the ML algorithm nor the modeling approach used is likely to lead to a significant improvement of the model quality. Instead, we infer that an enhancement of the input data could further improve the overall model quality, as the database also has an essential influence on the model quality [11,12]. In our study, we already proved that material data influence the model quality. Consequently, we encourage further studies to consider additional data fields from the area of material data when setting up a model predicting assembly start delayers to further optimize the model. Together with our previous work in the same research field [10], our findings observed are a good starting point in the prediction of assembly start delayers and the influence of material data on the model quality. As we could easily access the considered material data and integrate it into our data set, we added value to our model without much additional effort for data acquisition. Consequently, we have shown that it is worth also considering data, which might not have any influence on the model quality at first glance, and consequently is not commonly used. For future research in the field of applied machine learning, the elaboration of the database should be extended to other easily accessible data sources, even if they are not typically considered for the respective use case.

Conclusion
At machine and plant manufacturers, the manufacturing process typically begins with the design of the product and its components before planning and executing the production process to manufacture the individual components needed in the subsequent assembly process. An essential factor for meeting a delivery date is the start of the assembly on time and a prior timely supply of the components needed for assembly. Subsequently, components produced in the processes upstream of the assembly have a direct influence on the performance of the assembly process. To meet delivery dates, we set up a supervised learning model to predict potentially delayed individual components (we call them assembly start delayers) in the early stages of the manufacturing process. Currently, machine learning models in the related area of lead time prediction typically include information about the system status, the machines, and the orders in their prediction model and do not consider material data [9]. As the design of a product is a central process for machine and plant manufacturers and the components are typically tailor-made to meet the customer's needs, we assumed that material data influence the model quality. Thus, we formulated the following working hypothesis: "The model quality for the prediction of assembly start delayers increases when utilizing material data." To verify the working hypothesis, we applied the established CRISP-DM procedure at an exemplary chosen machine and plant manufacturer. Here, we created 12 different prediction models to classify components as "assembly start delayer" or "no assembly start delayer." These 12 models differentiated in the utilization of material data-including or excluding material data-and in the utilized ML algorithm-six algorithms per material data case. The target was to quantify the effect of utilizing material data on the quality of a model predicting assembly start delayers while comparing different ML algorithms. As metrics to evaluate the model quality, we used the MCC and F-Score based on a confusion matrix.
Evaluating the different quality metrics of the 12 prediction models revealed a positive impact of the material data on the model quality. Thus, the working hypothesis could be confirmed. However, in our case, there was only a slight increase in the MCC and F-score. As a possible explanation for the small impact on the model quality, we suspect the limited information about the material considered in our model-gross weight and component's name only. Adding further information about the material such as dimensions, volume, or number and specification of features in the component's CAD model like drill holes, shaft shoulders, radii or surface roughness could further increase the impact of material data. Nevertheless, even with our limited consideration of material data. We verified, that utilizing data, which is commonly not used in prediction models increases the model quality.
In total, we successfully analyzed the impact of material data on the quality of models predicting assembly start delayers and gave insights into the performance of different modeling approaches. With our results, we achieved our two main contributions: First, we developed a model to predict assembly start delayers utilizing a machine learning classification approach. Second, we identified that material data influence the model quality of a model predicting assembly start delayers. However, there was only a slight influence. With our findings, for future machine learning approaches in the area of production planning and control, we recommend considering data sources apart from typically used data sources as well. We were able to show that even atypical data sources can contribute to an improvement of the model. Funding Open Access funding enabled and organized by Projekt DEAL.

Data availability Not applicable.
Code availability Not applicable.

Conflicts of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.