Keywords

1 Introduction

Production management today faces numerous challenges such as increasing uncertainty and simultaneously growing complexity (Westkämper and Löffler 2016). Shorter product life cycles, individualization, and disruptive technological innovations require efficient implementation of changes (Schuh et al. 2017). The potential of the IoP for production management lies in providing data-driven decision support on all levels of managing production in volatile and uncertain business environments (Schuh et al. 2019a). Short-term production management focuses in particular on decision support in time-sensitive scenarios on or near the shop floor. Therefore, the aim of the research work is to learn and profit from historical data by developing self-learning production systems and, as a result, to significantly increase the decision-making quality and the decision-making speed in production environments (Müller et al. 2022). This is important to ensure the robustness of production processes by quickly making decisions and implementing appropriate measures (Stricker et al. 2015).

For this purpose, data and analysis latencies are to be minimized through the integration of continuous cross-domain data access and the development and combination of diagnostic, predictive, and prescriptive analytics models. Moreover, decision and implementation latencies are to be reduced by means of an appropriate collaboration of autonomous processes and model-based decision support as well as the implementation of suitable measures in the production system.

The practical realization of such decision support takes place through the development of a Production Control Center as shown in Fig. 1, in which interlinked applications contribute to increasing decision-making quality and speed in the production environment. Context-specific data from the IoP data lake is used in the sense of a control loop to generate data-driven transparency via the various applications with regard to emerging adjustment needs and to address these by deriving and implementing suitable measures.

Fig. 1
figure 1

Production Control Center

The five applications developed (cf. Fig. 1) focus in particular on use cases near the shop floor with an emphasis on the key topics of production planning and control , production system configuration, and quality control loops. The specific challenges, the methods used, and the results obtained through interdisciplinary research are described in detail in the following Sects. 2, 3, 4, 5, and 6. A summary and outlook are given in Sect. 7. The five applications described in this paper certainly do not address all possible challenges and problems in short-term production management, which is why further icons for future linked applications are already included in the proposed production control center (see Fig. 1).

2 Intelligent Production Management Through Predictive Quality

In order to continuously improve process- and product-related quality, data-based methods for decision support in production are being investigated as part of the Intelligent Production Management through Predictive Quality (PQ) application. The focus is on data analysis for PQ, which enable an early prediction of quality deviations and production defects, as well as the identification of the underlying causes. This information can then be used for deriving target-oriented corrective measures. As shown in Fig. 2, primarily production processes with two or more production steps are considered. This enables the investigation and development of approaches that lead to process step overarching predictions, as well as the identification of interactions between different process steps (Schäfer et al. 2019).

Fig. 2
figure 2

Intelligent Production Management through Predictive Quality

2.1 State of the Art

Currently, existing quality management methods are progressively supplemented with data-based approaches to face the challenges arising with increasingly complex products. One of the main challenges in implementing data-based decision support through PQ is the pre-processing and integration of diverse data sources (Groggert et al. 2017). Due to the various sources, there are a variety of formats and data types (Wang 2017). Common data management methods, such as Data Warehouse (Bauer and Günzel 2013) and Smart Factory Information (Yoon et al. 2019), mostly consider the technical implementation rather than a clear structure for the data which is needed for PQ applications. This results in the necessity of a data model with a comprehensive data structure. Various information modeling standards already exist. However, they omit standardized instructions on how to perform the modeling process (Sudarsan et al. 2005). Moreover, no product-centric models for manufacturing data could be found in the literature so far.

Utilizing data structured by a product-centric data model, PQ is able to derive product- and process-oriented predictions about quality using data analytics methods. To subsequently optimize quality, it is crucial to get insights into the trained model (Cramer et al. 2021). Model-agnostic methods allow to detect to what extent the model prediction depends on the different input variables as well as to compare different types of models (Vilone and Longo 2021). A systematic investigation of the methods with regard to their applicability in the context of PQ has not yet been conducted (Goldman et al. 2021).

2.2 Approach and Methods

The predictive capabilities of the PQ application will empower the operator to improve product and process quality. For automating these operations, a universal process-independent data model is required, especially in cross-process approaches (cf. Fig. 2) the heterogeneity of the processes and the associated data lead to problems during analyses. To solve these, a comprehensive meta-model for production data (MMPD) was developed by Cramer et al. (2021), which allows the derivation of production-related data models. These universal, yet application-specific data models ensure compatibility between the data and the required data analysis pipelines for PQ applications. The MMPD is a product-centric model and focuses on the holistic view of product-related data. The metadata provides the ability to incorporate the domain- or application-specific context required to accurately interpret the data points (cf. Fig. 3). Uniform interfaces and standards for data integration and consolidation procedures allow product-centric PQ applications to access only the data and information they required. In this way, the MMPD, with the automated data analysis pipeline built on it, serves as the basis for a PQ application ecosystem.

Fig. 3
figure 3

Extract of the MMPD on the left, Partial Dependence Plot (PDP, marked in blue) with Individual Conditional Expectation (ICE, marked in gray) lines on the right

To provide decision support in the optimization of production processes and quality improvement, the most important process parameters are identified and investigated. A requirement for the investigation of important features or parameters in the production process are accurate prediction models. The prediction models are used as a proxy for a simulation or a digital shadow of the production line, and it is assumed that a good prediction model captures all the intricacies of the production process that can reveal opportunities for optimization. These prediction models are trained in the data analysis pipeline discussed above, with the options of more specific or complicated model specifications if it is required.

The most influential parameters are identified with feature importance methods and on three levels of complexity. The first and most intensively researched level of investigation is singular feature importance . Singular features can indicate the most influential parameters to the prediction and by proxy, the overall quality. The second level of the feature investigation refers to the identification of interactions between features in the model. This could refer to parameters in one production step, but the more valuable outcome is finding interactions across production steps. This means the intervention or optimization point can be moved to the earliest possible step in the production line. The third level of feature importance is related to causality inference and the generation of causal graphical models that capture all relationships between parameters in the production line.

An example of the first level of investigation is partial dependence plot (PDP) (Friedman 2001) as the four examples in Fig. 3 show. The PDP displays the average relationship between the different values of a considered input feature and the predicted value of the target feature. For this purpose, marginalization is performed over the distribution of the feature of interest and the machine learning model prediction. As the other input features are marginalized, a function only depending on the feature of interest is obtained, including interactions with other input features. Figure 3 illustrates for example that higher values of input X1 lead to a higher model prediction. The PDP can also be used for interactions, including first and second-order effects and indicating the effect on the outcome when two features would be adjusted together. The PDP plot is enriched with Individual Conditional Expectation (ICE) plots, which indicate the prediction for different values of a feature of interest separately for each data point (Goldstein et al. 2015). ICE lines not parallel to the PDP indicate that there are interactions with other features. Figure 3 depicts that for input feature X1 the ICE lines roughly run in parallel to the PDP, which indicates that the impact of feature X1 on the model prediction surpasses the interaction with other input features. For causality representation , undirected graphical models prove to be useful by representing interactions in a digestible format, without committing to a direction of causality. Directed graphical models capture the directionality of the influences along the production line and provide a visual overview of all relationships identified.

2.3 Results and Conclusion

The developed MMPD enables the efficient use of universal data analysis pipelines for production data. Based on feature importance methods, both main and interaction effects can be detected to build causal models for root cause analysis in the future. The results presented here serve as a baseline for further work on improving product- and process-related quality. For example, this includes the integration of measurement uncertainties in model building for quality prediction. In addition, the elaboration of a concrete approach and the development of methods for the creation of causal models for production processes to determine the causes of predicted defects and quality deviations will be examined. Finally, a further necessary research priority will be focused on defining a practical way for integrating the data-based methods into established processes and workflows.

3 Enabling Decentralized Production by Objectifying Machine Setup Using Parameter Prediction

The events of recent years have changed the world of manufacturing. The Covid-19 pandemic demanded manufacturers of textile and plastic goods to flexibly and quickly switch their production to needed goods, such as masks or face shields (Missoni et al. 2021). Nowadays, due to globalization, companies operate in an increasingly volatile and uncertain environment and are often confronted with various types of disruptions.

One approach to address those issues is decentralized production . By switching from a centralized model with a single or few large production sites to a manufacturing environment with many smaller, widely distributed micro-factories, dependence on individual production sites is reduced and fast and flexible reactions to sudden, unforeseen events are enabled. Besides increased resilience, decentralized production networks offer many benefits, such as shorter delivery routes and times as well as a reduction in packaging material, reducing waste and increasing sustainability (Essers and Vaneker 2016; Morgan et al. 2021).

Two technologies, additive manufacturing (AM) and textile production , have proven their adaptiveness during the beginning of the Covid-19 pandemic. While traditional supply chains couldn’t keep up with the demand for personal protective equipment (PPE), a Czech manufacturer of 3D printers was able to ramp up mass production of face shields in just 3 days, in which dozens of prototypes were manufactured (Prusa Research 2022). By distributing the geometry files digitally, face shields could be produced globally at short notice. A similar observation was made in the textile industry. Clothing manufacturers in Germany switched their production to masks and protective equipment in a short time, producing up to 10,000 masks per day (Oertel 2020). Moreover, material suppliers and producers were connected via a founded platform (Schmelzeisen 2020).

To exploit the potential of decentralized production, managing increasing complexity in production planning and control , and a constant part quality must be guaranteed. This is increasingly difficult in a highly decentralized system, since the type of machines, the available resources, the environmental conditions, and the operator’s skill level can vary heavily. This is paired with the fact that for the presented manufacturing technologies, many process parameters are available that influence the resulting part quality and are oftentimes not fully understood. Additionally, there is a shortage of skilled workers in the above-mentioned, highly knowledge-dependent industries. In summary, to harness the full potential of a decentralized production network, the individual process must be flexible while being reliable and a defined, high-part quality must be achievable, regardless of variations in machines, material, environment, or operator skill.

3.1 State of the Art

The freedom and flexibility in part production via AM also entails high process complexity in form of many adjustable process parameters that influence the resulting part properties , like part strength and surface roughness, but also process factors, like manufacturing time. Those process parameters are typically adjusted for each part, based on expert knowledge or via a trial-and-error approach. Some parameters can have a significant effect on resulting part properties, like orientation on tensile strength. For example, one study found a 45.8% decrease in tensile strength between parts that were oriented horizontally and vertically on the build plate (Zaldivar et al. 2017). Currently, correlations between process parameters and part properties are mostly studied for each parameter individually. However, for a complete characterization of the process, interdependencies between parameters must be considered. For example, increasing layer height reduces the manufacturing time but increases surface roughness (Bintara et al. 2021), while reducing process speed has the inverse effect (Luzanin et al. 2013).

To handle the high amount of adjustable process parameters and their influence on part properties in various manufacturing technologies, previous studies have utilized machine learning-based techniques (Hsieh 2006; Jagadish et al. 2019; Jang et al. 2016). While typically reporting high prediction accuracies, the presented methods are not easily scalable, need a lot of computing power for each prediction, and rely on a very large set of training data.

3.2 Approach and Methods

To objectify the setting of process parameters in situations where high decision speed is necessary and based on a limited set of training data to achieve defined, high-part quality, an invertible neural network (INN) is set up.

The desired part quality can be achieved by several combinations of machine settings. Conventional (forward) neural networks determine the possibly achievable quality based on one particular parameter setting. INNs allow the problem to be inverted so that combinations of parameter settings are suggested to achieve the desired quality. The term INN was introduced in 2019 by Ardizzone et al. (2019). INNs differ in structure from conventional neural networks by the base layer, also called the “inverse coupling layer” (Dinh et al. 2017). In contrast with other neural networks, they can be inverted trivially. An advantage of using INNs in the AM and textile use cases is the possibility of further optimizing the production process according to certain criteria, such as production time or quality. Since different machine settings generating the same output are suggested, the most suitable ones for the specific task can be selected.

To improve the applicability of extrusion-based AM as a method for producing high-quality plastic parts decentrally, a method for non-planar AM with variable layer height was developed. Using this method, the technology’s freedom, based on a layer-by-layer manufacturing approach, is retained, while typical shortcomings like high anisotropy and high surface roughness are addressed. This is done by deliberately curving layers in three-dimensional space instead of manufacturing those layers in a planar way, parallel to the build platform. Three-dimensional layers inside the part can be shaped such that mechanical loads on the part are taken in strand direction as opposed to perpendicular to the strands. Outer layers are used to accurately represent the desired geometry, including potential freeform surfaces. This way, surface roughness can be reduced by 76% (Pelzer and Hopmann 2021), while retaining a large layer height for the majority of the part, therefore reducing manufacturing time.

3.3 Results and Conclusion

The benefits of agile, quickly adaptable manufacturing processes were utilized during the beginning of the Covid-19 pandemic. To aid in the need for PPE, face shields were manufactured around the world using AM. Since most people were printing the forehead part and buying elastic straps for securely wearing the face shield, the latter were in short supply. By designing a 3D printable elastic strap, setting up the associated manufacturing process while going through several iterations quickly, a highly efficient process could be set up in just 3 days. This way, it was possible to manufacture more than 800 elastic straps per day per machine. In combination with injection-molded and film-extruded parts, complete face shields could be produced in-house (Schmitz 2020). Similarly, designs for textile masks were elaborated and distributed to manufacturers who changed their production focus to masks. By setting up a supplier-manufacturer platform, it was possible to enable the exchange and distribution of close to 2 billion masks and 79 million protective clothes.

In a separate study, it was shown that using the developed INN for parameter prediction, it is possible to automatically generate sets of process parameters that are capable of accurately replicating the demanded part properties. In most cases, the accuracy of the tested part properties was within 82.76% to 99.98% of the demanded output (Pelzer et al. 2023). Only few cases resulted in lower accuracies; however, this could be attributed to extreme combinations of demanded part properties and was identified beforehand as unlikely to succeed, regardless of chosen parameters. These edge-cases were used to identify the barriers of achievable quality.

The research on non-planar AM shows that previously present conflicts, like the trade-off between manufacturing speed and surface roughness, can be resolved, resulting in a more capable manufacturing technology and higher quality parts.

In conclusion, it was shown that all necessary aspects for a decentralized production – agility and flexibility, part quality as well as reliability and objectivity in process setup – could be achieved. By combining all mentioned advances, the foundation for decentralized manufacturing is laid.

4 Reinforcement Learning in Production Scheduling

A general shift toward growing product individualizations and more flexible production environments has led to a significantly increased complexity in production management (Haeussler et al. 2020; Schuh et al. 2019b). Coping with smaller batch sizes, flexible material flows and frequent disturbances on the shop floor creates additional requirements especially on the short-term production management (Lang et al. 2019). Conventional ERP systems could not yet support these challenges sufficiently, so new systems continue to be developed, e.g., Advanced Planning Systems (Zijm and Regattieri 2019).

In addition to traditional optimization methods, recent approaches investigate the feasibility of applying learning-based methods, e.g., reinforcement learning (RL) to scheduling tasks in production (Xie et al. 2019). What most approaches have in common is the focus on the main control tasks order release and dispatching. By comparing the performance to traditional methods used to solve such problems, e.g., ConWIP or Shifting Bottleneck, trained RL agents show promising solutions for scheduling tasks (Kemmerling et al. 2021). Rather than a purely academic investigation of RL in abstract scheduling tasks, the goal in the work presented here is to enable the use of RL approaches in realistic production scenarios by identifying remaining obstacles and addressing them.

4.1 State of the Art

During the last decades of research on production planning and control many approaches and frameworks have been published (Wiendahl et al. 2005; Schuh 2012; Lödding 2016). In accordance with Lödding, general production control tasks, e.g., order release and dispatching, with short-term influence on the production performance still get special attention in order to cope with the stated challenges (Kemmerling et al. 2021; Waschneck et al. 2018). As depicted in Fig. 4, the order release task determines the time and sequence in which orders are released for production and thus controls the actual input to the production system. Dispatching or sequencing determines the sequence in which orders are processed at each work system (Lödding 2016).

Fig. 4
figure 4

Task of production control (Lödding 2016)

With a growing level of complexity, especially for flexible material flows and a high number of machines and orders, classical approaches like mathematical optimization were complemented by heuristics to reduce the scope of consideration (Samsonov et al. 2021). Due to an increasing operational use of assistance systems based on simulation, it becomes feasible to depict and hence understand a higher complexity level as present methods could provide (Rabe et al. 2008). In the production context, discrete-event simulation is broadly used to map the production process including orders, resources, material flows, production plans, buffers, sequences, and performances (Fishman 2001). Discrete-event simulations provide the foundation for the application of learning-based methods such as RL.

The application of RL to scheduling problems in production is an emerging field of study with a wide range of different approaches being investigated. They differ in their structure as single-agents (Samsonov et al. 2021; Zhang et al. 2020) or multi-agent systems (Waschneck et al. 2018), use different kinds of algorithms such as value-based (Waschneck et al. 2018; Samsonov et al. 2021) and actor-critic methods (Zhang et al. 2020), and consider different ways of modeling state and action spaces. RL is well suited to be applied to scheduling problems , because a strategy can be derived by direct interaction with unknown environments and without having to rely on externalized expert knowledge (Panzer and Bender 2022).

While the problem has been receiving increasing attention in the literature, the focus of present works tends to be on solving heavily abstracted problems rather than researching the transfer of RL systems to real production environments.

4.2 Approach and Methods

Solving a problem using RL requires formulating it as a sequential decision problem, in which an agent interacts with an environment by performing certain actions after observing the environment’s state. The agent receives a reward depending on how well it solves the given problem and, during a training period, learns a strategy that maximizes its long-term rewards. The agent’s observations in response to actions are typically computed by a simulation (Gosavi 2015). While commercial, widely accepted simulation tools for order release and other production scheduling problems exist, they generally do not provide interfaces which allow them to be used by common RL software. RL libraries and frameworks tend to be written in programming languages like Python, which offer advantages such as easy adaptability for research, but do not provide the sufficient standard for direct implementation in an industrial application. Compatibility with commercial simulation tools is, however, of paramount importance to enable the use of RL learning in real production environments. To facilitate this, an interface based on network sockets was created for the practical application of the use case presented here (Kemmerling et al. 2021). This makes it possible for the RL agent created in Python to communicate directly with a simulation in the commercial tool Plant Simulation .

As the user acceptance of automated scheduling agents must be assured, an application to compare and visualize different order release scenarios based on their performance in terms of the adherence to delivery dates and utilization of available resources has been developed. The integration of real problem cases into the application and the combination of the different functionalities in an online application, i.e., simulation, RL algorithm, and visualization for different scenarios, ensures the precise aim of solving practical problems.

4.3 Results and Conclusion

Research performed during the development of the application presented here has investigated both order release (Kemmerling et al. 2021) as well as combined order release and sequencing problems (Samsonov et al. 2021) and demonstrated that RL agents can learn successful strategies to solve such problems. In addition, RL agents trained in this way have been shown to solve order release problems within the software Plant Simulation (Kemmerling et al. 2021), which is an important step toward practical use of RL in real-world scenarios. However, this transfer onto commercial simulation software also highlights the remaining challenges, which need to be overcome by RL solutions. These include incorporating further optimization objectives and constraints such as adherence to delivery dates , scaling the approach toward larger problem instances as they are encountered in real production scenarios, and transfer learning over different types of production. Further challenges lie in the investigation of how well RL solutions can perform disturbance management to appropriately respond to production interruptions and in examining how online optimization with RL can affect response times.

5 Process Analyzer – Weakness Detection in Event Logs

For companies, business process improvement is becoming more important (Schmelzer and Sesselmann 2020). One of the key tasks within business process improvement is the weakness detection during the process analysis phase (Dumas et al. 2018). Based on workshop formats and interviews, these approaches are time-consuming, cost-intensive (Schmelzer and Sesselmann 2020), and exposed to subjective influences (Bergener et al. 2015). For process mapping, process mining discovery algorithms can increase the objectivity and reduce the effort by analyzing event logs (van der Aalst et al. 2021). For weakness detection in process analysis, however, methodological knowledge is needed to analyze an actual process flow and ensure applicability in practice (Bergener et al. 2015). The Deviation Detection application focuses on the automatic detection of definitions as well as root cause analysis using machine learning techniques, while here the focus is on the user-defined deviation. The main objective is to bring user domain knowledge into the framework.

5.1 State of the Art

Various approaches from the literature aim to address the explained challenges. Authors like Bergener et al. (2015), Hoehenberger and Delfmann (2015), and Rittmeier et al. (2019) use weakness patterns that formalize knowledge about the structure of process weakness types to apply them to process models with pattern-matching algorithms. In approaches such as Outmazgin and Soffer (2016) this idea is applied to event logs , but only for specific workaround weakness types. Hence, huge automation potential remains for the weaknesses identification in real business processes with low effort. Several process mining techniques for general weakness detection do already exist but often rely on reference “to be” process models. The remaining challenge is to develop weakness models of generic business process weakness types. Their application on event logs enables weakness detection in as-is-processes without a reference model and hence can reduce effort and subjectivity.

5.2 Approach and Methods

The Process Analyzer enables semi-automated detection of weaknesses in business and production processes based on event logs . To this end, domain expert knowledge on relevant process weakness types is transformed into weakness models, which are applied with algorithms to event logs.

A weakness model is the formalized description of a weakness type with regard to its characteristic properties (Schuh et al. 2021). The graphic description method IDEF0 (ICAM Definition for Function Modelling) is used as a framework for the modeling of process weakness types. IDEF0 models consist of five elements: Activity/Process, Input Information, Control Information, Resources, and Output Information (Presley and Liles 1995). It can be applied to describe weakness models using the elements weakness type and data requirements necessary to detect a weakness, a mathematical description as a rule for detection, algorithmic functions that enable the application of the weakness model as well as the shape of the identified weakness (e.g., event, tuple of events, …). Figure 5 shows a generic model for process weakness types.

Fig. 5
figure 5

Elements of process weakness type model (Schuh et al. 2021)

5.3 Results

Seven generic weakness types were derived from systematic literature followed by a multi-criteria relevance assessment by Schuh et al. (2021): A redundant activity describes the repeated execution of a single activity within a process instance. A repetition of an activity sequence within a process instance is labeled as a backloop. Unwanted activities that occur at least once (e.g., printing) represent the weakness type unintentional activity. Parallelizable activities indicate a reduction in lead time in comparison to sequential execution. The potential for activity acceleration is addressed by the weakness type unsuitable execution time. A bottleneck is an activity in a process instance with the longest execution time. Transition times specify the time between two consecutive events, which is generally considered a process weakness.

Regarding the data requirements, all the mentioned weakness types require the basic event log attributes, process instance, activity, and time stamp for identification. Additionally, the weakness types unsuitable scope of activities, bottleneck, and transition time require start and end timestamps for each event.

The mathematical description of weakness follows the consideration that algorithms must be able to process the information from event logs . To ensure practical relevance the concept’s database is the event log, which is a set of events stored in the information system. In this work, the mathematical rule-based description of an event i is defined as:

$$ i=\left(m,n,o\right)\ \textrm{or}\ i=\left(m,n,{o}_i,{o}_e\right) $$
(1)

with i = event; m(i) = process instance of event i; n(i) = activity name of i; o(i) = timestamp of i; oi(i) = initial timestamp of i; oe(i) = end timestamp of activity i

The given attributes m(i), n(i), and o(i) or oi(i)/oe(i) are variables, specific values of these attributes are indicated with “*”. Following, the mathematical descriptions are derived for the example of the weakness type redundant activity. The set I(m*, n*) is defined as all events in the event log with a specific process instance m* and specific activity name n*:

$$ I\ \left({m}^{\ast },{n}^{\ast}\right)=\left\{i\in I\mid m(i)={m}^{\ast}\wedge n(i)={n}^{\ast}\right\} $$
(2)

The set I(m*, n*) equals all redundant activities that occur more than once in a process instance, leading to a mathematical description for a redundant activity:

$$ \mid I\ \left({m}^{\ast },{n}^{\ast}\right)\mid > 1\to I\ \left({m}^{\ast },{n}^{\ast}\right)={}\textrm{``redundant}\ {\textrm{activity\hbox{''}}} $$
(3)

In practice, this means that the weakness type “redundant activities” exits, if a process instance contains two events with the identical activity name. Based on the mathematical rule-based descriptions, Schuh et al. (2021) defined nine algorithmic requirements on how to apply the models to event logs. In the context of this paper the requirements have been translated into a pseudo-code, which is followingly illustrated for the weakness type of redundancy:

for each activity in the set of events in the event log:

for a process instance in Process Instances (as a set of process instances in the event log)

if count of number of activities in process instance > 1:

then return duplicate activity found

Using this structure, the requirements for an executable algorithm can be derived. For the process analyzer, algorithms were designed and tested using simulated data generated from real event logs (Pourbafrani et al. 2021a). The provided platform allows to generate event logs with known deviations and assesses whether the formal definitions are able to catch these deviations.

5.4 Conclusion

With increased pressure on process performance, also the effectivity and efficiency requirements for the methods for business process improvements increase. By process weakness type modeling and algorithmic implementation, the process analyzer enables automated weakness detection in event logs, thus offering significant reductions in effort and subjectivity compared to conventional approaches in practice. Further research should address the quantification of performance losses due to process weaknesses as well as the standardized derivation of measures including the quantification of their impact on process performance. Combined, those concepts could serve as holistic decision support for process analysis and design, which is already being pursued by the authors.

6 Deviation Detection in Production Lines Using Process Mining

In order to meet the high customer requirements in terms of individualized products and short delivery times, global supply chains with strong interdependencies have formed in recent decades. In order to absorb possible external and internal disruptions, it is necessary to build robust production systems. The response to disruptions in production is the task of the production controller. The task of the production controller is to make high-quality decisions in a short time. Furthermore, the production systems and the dependencies between the subsystems are complicated, and because of this, it is difficult for one person to derive suitable countermeasures. The complex processes of production planning and control require appropriate decision support so that the decision quality can be improved. In the current case, however, there is a lack of suitable IT support, so that complex decisions are primarily made on the basis of experience. Often, the production controller is insufficiently supported by IT systems and therefore relies on experience. In the area of production planning and control, it is expected that decision support systems will improve the decision-making processes and reduce the probability of making the wrong decisions. The recorded execution data of production systems is a great source of information that can be used to support production controllers in deviation management. This information is transformed into the form of event logs in the context of process mining. The aim of this research is to create a decision support system to enhance the decision-making quality on the shop floor (Mühge 2018; Fischer et al. 2020). This chapter presents a framework and demonstrator for the management of detection and reaction of disturbances on the shop floor by using process mining and machine learning. Compared to the application Process Analyzer, this application supports daily operational decisions on the detection and handling of disturbances automatically, whereas the Process Analyzer application is based on the user’s input for the definition of deviations. The following chapter presents how the framework and demonstrator have been approached within the context of the Internet of Production.

6.1 State of the Art

To understand disturbance and deviation handling, deviations and disturbances are defined. Unplanned and unforecasted deviations from the planned status are referred to as disturbances. These result in production shortfalls or performance reductions without intervention (Schwartz 2004). Deviations are characterized by comparing planned and actual values. Deviations do not necessarily have negative consequences for a production system, while disturbances normally have. If a defined tolerance range is exceeded, deviations are classified as disturbances due to the negative effects on the production system. If the tolerance range is regularly violated, this is referred to as systematic disturbance. One of the typical tasks of production controllers is to manage the performance of production, so reducing the negative impact of disturbances is particularly important (Meissner 2017).

The state of the current research in this fields aims to support the production controller in automatic disturbance handling. Existing approaches in the field of disturbance management by production controllers can be divided into simulation-based support, methodical support, process mining techniques, and machine learning-based support. The machine-learning-based approaches use case-based reasoning for knowledge representation for a rescheduling approach (Priore et al. 2015; Khosravani et al. 2019). Other approaches use Support Vector Machines (SVM) or complex event processing for the prediction of deviations and disturbances.

In this research, the focus is on process mining techniques since they are data-driven and use historical event data to interactively improve processes (Pourbafrani et al. 2021a). Each product in a production system is a process instance, and the recorded process instances are able to reveal performance and compliance deviations and potential root causes. Process mining deviation detection approaches are aligned and supported by machine learning techniques (Pourbafrani et al. 2021b), which makes providing a novel deviation detection framework for production lines possible.

6.2 Approach and Methods

To develop a first demonstrator, a framework for decision support systems (DSS) was developed based on the structure proposed in Sauter. DSS is described as an IT-based system that enables the user to access context-relevant data, analyze it, and evaluate different alternatives for a specific decision situation (Sauter 2010). Due to the tasks of the DSS, it is structured into three parts, namely, a data module, a model module, and a user interface (Sauter 2010). In the following, the adapted framework for deviation detection and its components will be described. The data component uses feedback data from production and machines and has the task of gathering data from different enterprise IT-Systems like ERP, MES, or IoT platforms to combine as much data as possible to enable the comparison between the actual and planned states of the production system. The data component will provide the data in the form of an event log, which is needed for the process mining and the later machine-learning components (Fig. 6).

Fig. 6
figure 6

The defined and considered list of deviations w.r.t. performance and activity flow in the production lines using their event logs

The framework consists of three main modules. The first module is process mining, which discovers the current process flow of the products and orders in progress. Process mining not only enables the representation of the actual and planned process flow but also enables the identification of deviations in the actual process and in comparison to its planned flow. The set of labeled deviations in the context of performance and conformance that the framework is able to identify is presented in Fig. 6. The identified deviations will then be labeled in the second module by a machine-learning algorithm, and it will be checked if they are a disturbance. Afterward, the potential causes of the detected disturbances are identified, which can be used as a recommender system for similar disturbances in the future. This represents the third module of the framework. The process flow, identified deviations, and labeled disturbances, as well as the proposed countermeasures from the recommended system, will be presented to the production controller in the user interface. There, he can give feedback to the model component on whether the disturbances were labeled correctly and if the recommended countermeasures were suitable. With the feedback, the model components are trained continuously and enable a continuous improvement of the DSS.

6.3 Results and Conclusion

The framework was implemented as a Python web application. With the process mining algorithms, deviations are detected w.r.t. activities, resources, process instances (cases), and the overall processes. Afterward, using different techniques such as decision trees, the decision trees are trained using the detected deviations. The resulting trees are able to present the potential causes and situations that lead to the specific types of deviation happening. The causes are identified, and countermeasures are proposed. Furthermore, the application of process mining was evaluated in the context of a pipe manufacturer. A sample-derived decision tree can be based on the duration of process instances as a deviation in the application.

The purpose of the proposed framework is to identify and react to disturbances in production lines w.r.t. their event logs . The framework and its modules were designed and implemented to make the evaluation using real data possible. This framework was evaluated using simulated event data and real-world data of processes in the Cluster of Excellence “Internet of Production” project with the main purpose of making decisions within certain constraints. The comprehensive considered types of deviation and extracted attributes are the proper platform for the use of predictive process monitoring in the case of online detection and reaction of deviations in production lines. The next step is to make the framework executable for the streaming data of production lines, which requires deployment on the actual shop floor settings.

7 Conclusion

In this paper, the work of the IoP’s short-term production management research group was presented. This includes five individual and partially interlinked applications that address a variety of issues in short-term production management. They pursue the common goal of data-driven decision support in the Production Control Center in order to increase both the decision quality and the decision speed in production environments on or near the shoop floor. The vision of the self-learning production system, in which learning and profiting from historical data are intended, is central to this. Subsequently, the context-specific selection and processing of data provide the basis for the research contributions achieved in the various applications.

Regarding the Predictive Quality and automated RCA application (2), three major research contributions are made: defining a comprehensive data model and an exhaustive ML framework, quantifying uncertainty for predictive models, and using feature importance as well as other model-agnostic methods to gain process insights. A similar contribution is made with the application Parameter Prediction using INN (3). By training an invertible neural network based on historical and synthetically generated data, process parameters are predicted which can be used to produce components with the desired quality properties.

The application of RL in production scheduling investigates the feasibility of applying reinforcement learning to common scheduling tasks in production and compares the performance of trained reinforcement learning agents to traditional methods used to solve such problems (4). While reinforcement learning shows promise, it has to be pointed out that challenges such as scalability and compatibility with common simulation software remain.

In both applications Process Analyzer (5) and Deviation Detection (6), the potentials of process mining in the context of production management are investigated. While the Deviation Detection application is designed to identify and mitigate performance and compliance deviations in production systems, the Process Analyzer concept enables the semi-automated detection of weaknesses in business and production processes utilizing event logs. By using process mining techniques on event logs, effort and subjectivity for the weakness detection in as-is-processes can be reduced without requiring a reference process model.

The applications presented currently differ partially in their implementation status and are continuously being developed further. This includes in particular the continued interlinking of the work within the research group as well as in the entire IoP. In the medium term, all developed prototypes are to be integrated into the IoP Kubernetes cluster, and in the long term, the real-time capability is to be increased for use in real production environments.