Reduction: Insights from the MONSOON Industry 4.0 Project

. The proliferation of cyber-physical systems and the advancement of Internet of Things technologies have led to an explosive digitization of the industrial sector. Driven by the high-tech strategy of the federal government in Germany, many manufacturers across all industry segments are accelerating the adoption of cyber-physical system and Internet of Things technologies to manage and ultimately improve their industrial production processes. In this work, we are focusing on the EU funded project MONSOON, which is a concrete example where production processes from diﬀerent industrial sectors are to be optimized via data-driven methodology. We show how the particular problem of waste quantity reduction can be enhanced by means of machine learning. The results presented in this paper are useful for researchers and practitioners in the ﬁeld of machine learning for cyber-physical systems in data-intensive Industry 4.0 domains.


Introduction
The proliferation of cyber-physical systems and the advancement of Internet of Things technologies have led to an explosive digitization of the industrial sector. Driven by the high-tech strategy of the federal government in Germany, many manufacturers across all industry segments are accelerating the adoption of cyber-physical system and Internet of Things technologies to manage and ultimately improve their industrial production processes.
The EU funded project MONSOON 4 -MOdel-based coNtrol framework for Site-wide OptimizatiON of data-intensive processes -is a concrete example where production processes from different industrial sectors, namely process industries from the sectors of aluminum and plastic, are to be optimized via data-driven methodology.
In this work, we are focusing on a specific use case from the plastic industry. We use sensor measurements provided by the cyber-physical systems of a real production line producing coffee capsules and aim to reduce the waste quantity, i.e., the number of low-quality production cycles, in a data-driven way. To this end, we model the problem of waste quantity reduction as a two-class classification problem and investigate different fundamental machine learning approaches for detecting and predicting low-quality production cycles. We evaluate the approaches on a data set from a real production line and compare them in terms of classification accuracy.
The paper is structured as follows. In Section 2, we describe the production process and the collected sensor measurements. In Section 3, we present our classification methodology and discuss the results. In Section 4, we conclude this paper with an outlook on future work.

Production Process and Sensor Measurements
One particular research focus in the scope of the project MONSOON lies on the plastic sector, where the manufacturing of polymer materials (coffee capsules) is performed by the injection molding method. Injection molding is a manufacturing process that produces plastic parts by injecting raw material into a mold. The process first heats the raw material, then closes the mold and injects the hot plastic. After the holding pressure phase and the cooling phase the mold is opened again and the plastic parts, i.e., coffee capsules in our scenario, are extracted. In this way, each injection molding cycle produces one or multiple parts. Ideally, the defect rate of each cycle tends toward zero with a minimum waste of raw material. In fact, only cycles with a defect rate below a certain threshold are acceptable to the manufacturer. In order to elucidate the manufacturing process, we schematically show the parts and periphery of a typical injection molding machine in Figure 1. As can be seen in the figure, the injection molding machine comprises different parts, among which the plastification unit builds the core of the machine, and controllers that allow to steer the production process.
The MONSOON Coffee Capsule and Context data set [2] utilized in this work comprises information about 250 production cycles of coffee capsules from a real injection molding machine. It contains 36 real-valued attributes reflecting the machine's internal sensor measurements for each cycle. These measurements include values about the internal states, e.g. temperature and pressure values, as well as timings about the different phases within each cycle. In addition, we also take into account quality information for each cycle, i.e., the number of nondefect coffee capsules which changes throughout individual production cycles. If the number of produced coffee capsules is larger than a predefined threshold, we label the corresponding cycle with high.quality, otherwise we assign the label low.quality. The decision about the quality labels was made by domain experts.
Based on this data set, we benchmark different fundamental machine learning approaches and their capability of classifying low-quality production cycles based on the aforementioned sensor measurements. The methodology and results are described in the following section.

Application of Machine Learning in Plastic Industry
By applying machine learning to the sensor measurements gathered from a production line of coffee capsules equipped with cyber-physical systems, we aim at detecting and predicting low-quality production cycles. For this purpose, we first preprocess the data by centering and scaling the attributes and additionally excluding attributes with near zero-variance. Preprocessing was implemented in the programming language R based on the CARET package [7].
Based on the preprocessed data set, we measured the classification performance in terms of balanced accuracy, precision, recall, and F1 via k-fold cross validation, where we set the number of folds to a value of 5 and the number of repetitions to a value of 100. That is, we used 80% of the data set as training data and the remaining 20% as testing data for predicting the quality of the production cycles. We averaged the performance over 100 randomly generated training sets and test sets.
We investigated the following fundamental predictive models, all implemented via the CARET package in R: k-Nearest Neighbor [4]: A simple non-parametric and thus model-free classification approach based on the Euclidean distance. -Naive Bayes [5]: A probabilistic approach that assumes the independence of the attributes.
-Classification and Regression Trees [9]: A decision tree classifier that hierarchicaly partitions the data. -Random Forests [3]: A combination of multiple decision trees in order to avoid over-fitting. -Support Vector Machines [11]: An approach that aims to separate the classes by means of a hyperplane. We investigate both linear SVM and SVM with RBF kernel function.
We evaluated the classification performance of the predictive models described above based on the injection molding machine's internal states which are captured by the sensor measurements. The corresponding classification results are summarized in Table 1. As can be seen from the table above, all predictive models reach a classification accuracy of at least 63%, while the highest classification accuracy of approximately 69% is achieved by the k-Nearest Neighbor classifier. For this classifier, we utilized the Euclidean distance and set the number of nearest neighbors k to a value of 7. In fact, the k-Nearest Neighbor classifier is able to predict the correct quality labels for 172 out of 250 cycles on average.
It is worth nothing that this rather low classification accuracy (69%) might have a high impact on the real production process, since in our particular domain hundreds of coffee capsules are produced every minute such that even a small enhancement in waste quantity reduction will lead to a major improvement in production costs reduction. In addition, we have shown that the performance of the k-Nearest Neighbor classifier can be improved to value of 72% when enriching the sensor measurements with additional process parameters [2].
To conclude, the empirical results reported above indicate that even a simple machine learning approach such as the k-Nearest Neighbor classifier is able to predict low-quality production cycles and thus to enhance the waste quantity reduction. Although the provided sensor measurements are of limited extent regarding the number of measurements, we believe that our investigations will be helpful for further data-driven approaches in the scope of the project MONSOON and beyond.

4
In this work, we have focused on the EU funded project MONSOON, and have shown how the particular problem of waste quantity reduction can be enhanced by means of machine learning. We have applied fundamental machine learning methods to the sensor measurements from a cyber-physical system of a real production line in the plastic industry and have shown that predictive models are able to exploit optimization potentials by predicting low-quality production cycles. Among the investigated predictive models, we have empirically shown that the k-Nearest Neighbor classifier yields the highest prediction performance in terms of accuracy.
As future work, we aim at investigating different preprocessing methods and ensemble strategies in order to improve the overall classification accuracy. We also intend to evaluated different distance-based similarity models [1] for improving the performance of the k-Nearest Neighbor classifier. In addition, we intend to extend our performance analysis to other industry segments, for instance the production of surface-mount devices [10], and to investigate metric access methods [8,12] as well as ptolemaic access methods [6] for efficient and scalable data access.

6
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.