Introduction

Motivation and problem statement

Digital transformation is driving the current technological trends in industrial manufacturing. Rapid information and communication technologies (ICT) advancements have enabled the development of advanced sensors, data acquisition systems, wireless communication devices, and distributed computing systems. The communication between machines and humans or between machines and products is enabled (Weber et al., 2017). This extensive communication involves large volumes of data from various data sources. The digital transformation of manufacturing is subject to an increasing data orientation to merge the physical with the cyber world in terms of a digital representation of the physical world enabled by cyber-physical production systems (CPPS) (Qi & Tao, 2018).

Data analytics has become a success factor for manufacturing companies (World Economic Forum, 2021). Data Analytics refers to “the theories, technologies, tools, and processes that enable an in-depth understanding and discovery of actionable insight into data” (Cao, 2017). Although the foundations have existed before (Monostori et al., 1996), the required low-cost computation power has recently become available. Therefore, more and more manufacturers are moving towards data-driven approaches to increase productivity and enhance manufacturing systems’ capabilities (Brynjolfsson & McElheran, 2016, 2019). Quality management is a focus area of data-driven approaches. Data analytics facilitate the continuous improvement of product and process quality (Bergs et al., 2020a). Beyond improving a traditional quality understanding, data analytics also enables a paradigm shift from reactive approaches (such as prevailing quality inspections) towards proactive intervention and adaptation (Bergs et al., 2020a; Davis et al., 2012). It is often neglected that individual manufacturing processes are linked with other processes to form a process chain. Central questions can only be answered inadequately with an isolated view of separate processes and their quality. These questions include error propagation, process–product interactions, or the robustness of the entire process chain. Data-driven approaches facilitate unveiling these aspects and promoting a systemic manufacturing understanding.

The generated knowledge and insights from the data must be effectively transferred to the manufacturing system regarding decision support or direct control strategies. Current IT systems in manufacturing, including those for implementing data analytics, are predominantly isolated applications (Angione et al., 2019; Gluchowski & Chamoni, 2015). Such applications are often rigid and area-specific (Meister et al., 2019). They lack system integration and cannot adapt to changing requirements. In contrast, today’s complex and connected manufacturing systems require holistic analytics approaches.

Consequently, analytics results have not been effectively deployed in most manufacturing systems (Gröger, 2021). Platform approaches are promising for overcoming these issues. A data analytics platform is “an information system for analytical data processing” (Kemper et al., 2010). Transaction-centric and data-centric platforms can be distinguished. Transaction-centric platforms mediate transactions by matching supply and demand, whereas data-centric platforms create a holistic data system (Engelhardt et al., 2017). Internal platforms are generally data-centric (Hoffmann, 2018). They aim to constitute a single source of truth by integrating all data into a central entity and enabling data usage for decision support and control (Hoffmann, 2018; Schuh et al., 2020; World Economic Forum, 2021). Additionally, synergies can be applied for standardization (e.g., interfaces or services) to reduce cost, increase scalability, and foster flexible manufacturing. By shifting towards platform approaches, data and information technology can effectively link with expert knowledge of manufacturing systems and processes (Kozjek et al., 2020; Woo et al., 2018).

To capitalize on the advantages of platforms in manufacturing, they must integrate and support holistic planning and operation of entire manufacturing systems. Manufacturing systems are becoming increasingly complex in terms of, e.g., increasing customization of products and the number of different variants. Traditional quality management and improvement approaches are sequential and have a retrospective feedback loop that loses their validity in this environment (Kuhn et al., 2018; Morariu et al., 2020). Feedback loops leverage data analytics and allow for decentralized, iterative decision-making close to the subject of planning and operation. It enables dynamic and adaptive approaches and enhances the system’s responsiveness. In addition to control, adaption becomes essential during the operation of a manufacturing system. While control has a retrospective character—adjusting manufacturing after incidents and events—adaption proactively adjusts plans and aligns them with real-time insights into the manufacturing system (Mourtzis & Vlachou, 2018; Niehues, 2016; Peruzzini & Pellicciari, 2017). Thereby, sustainability in manufacturing can be advanced, especially concerning the economic and environmental dimensions, through the reduction of scrap and rework and efficient quality inspection strategies (Chen et al., 2015; Thiede, 2018; Thiede et al., 2020).

Research objectives

Against this background, two guiding research questions (RQ) are derived. In general, manufacturing systems are complex systems with extensive interactions and dependencies between processes, products, and the environment that are not fully understood. The existing IT infrastructure of isolated applications further increases this complexity, and the lack of holistic approaches complicates achieving the system’s overarching goal. Therefore, a central, holistic entity for all manufacturing data and its usage must be investigated, i.e., a single source of truth. Considering these top-down needs, the first research question is formulated:

RQ 1

What platform architecture is required for holistically acquiring, managing, and analyzing manufacturing data and deploying generated knowledge?

Additionally, this paper draws motivation from existing research on how data-driven approaches can enhance quality management. These approaches are referred to as “data-driven quality planning and operation” and aim at reducing quality-related costs and manufacturing time while at the same time ensuring high product quality. Therefore, the following bottom-up research question is formulated:

RQ 2

How can the application of data-driven quality management approaches in manufacturing be enabled through an IT entity, i.e. a platform?

Both research questions (top-down and bottom-up) are closely related to each other. Answering the first top-down research question will automatically contribute to answering the second bottom-up research question. Against this background, a generic manufacturing analytics platform architecture is proposed. For validation, data-driven quality management within a case study from electronic production is tested. Thereby, a contribution toward the second research question is achieved.

The paper is structured as follows. Sect. “Theoretical background” addresses the necessary theoretical background of this paper. Sect. “State of research” analysis the state of research and derives research demand. Sect. “Development of manufacturing analytics platform” develops a conceptual framework to address the derived research demand. The framework is applied to a case study from electronics industry in Sect. “Case study: data-driven quality management in electronics production”. The conclusion is drawn in Sect. “Conclusion”, and an outlook is given in Sect. “Outlook”.

Theoretical background

Cyber-physical production systems

A CPPS is a unique application of a cyber-physical system (CPS) that refers to a feedback loop interaction of the physical world with the cyber world of embedded computers and networks that monitor and control a physical process in the area of manufacturing and is strongly interrelated with the term of industry 4.0 (Lee & Seshia, 2016; Monostori, 2014; Thiede, 2018).

A CPPS consists of four levels: I. physical world, II. data acquisition, III. cyber world, and IV. decision support/control (Thiede, 2018; Thiede et al., 2016). Figure 1 graphically shows the main elements of a CPPS. The different levels are introduced in the following subchapters.

Fig. 1
figure 1

CPPS framework

Physical world

The physical world represents the system of interest. In the case of cyber-physical production systems, this is the manufacturing system. According to CIRP, the International Academy for Production Engineering, manufacturing is defined as “all functions and activities directly contributing to the making of goods” (Segreto et al., 2019). Thus, manufacturing comprises “the value-adding processes, namely, fabrication and assembly, and the organizational functions” (Segreto et al., 2019).

Manufacturing encompasses multiple hierarchical levels: products, processes, and process chains. In manufacturing, raw materials are transformed into final products by value-adding to meet customers’ requirements (Schenk et al., 2014; Westkämper, 2006). The value-adding process consists of a physical transformation (e.g., transforming geometry, state of matter, or chemical composition). Therefore, manufacturing is called a “transformation process” (Dyckhoff & Spengler, 2007). Multiple processes are connected by a material flow to form a process chain (Tönshoff & Denkena, 2011). In a process chain, products are gradually transformed. Products between the initial and the final state are called “intermediate products” (Wengler, 1996). Their intermediate and final product features can characterize both intermediate and final products on a product level. These features are quantitative (e.g., temperature) or qualitative (e.g., material composition) descriptions of the product, as well as definable and deterministic measurements (Klocke et al., 2014; Wuest et al., 2011). Product features change with every manufacturing process due to the transformation processes. The final features of a product are referred to as product properties. (Filz et al., 2020a, 2020b; Wuest, 2015).

Organizational functions and management mainly concern the planning and operation of manufacturing systems. The traditional approach to planning and operating manufacturing systems considers these complicated systems. Complicated systems are predictable and can be understood by knowledge (ElMaraghy et al., 2019). Managing complex manufacturing systems, therefore, require experts (Snowden & Boone, 2007). In the manufacturing context, these experts centrally make decisions on the planning and operation of the system. A sequential approach is followed: Before starting production, a manufacturing system is planned, while afterward, it is operated (Caggiano et al., 2019). Planning of the manufacturing system determines what is to be produced, when, and how (D’Addona et al., 2019). It includes process design and equipment design (Eversheim, 2002). During the operational phase, manufacturing systems are controlled. Control of manufacturing systems consists of process monitoring and control, quality monitoring and control, shop floor control, and inventory control (Caggiano et al., 2019).

Over the last years, manufacturing systems have become increasingly complex (ElMaraghy et al., 2019; Kuhn et al., 2018). This applies particularly to high technologies such as battery cell manufacturing, casting, or electronics production with many process–product relationships. A complex system incorporates uncertainty, so the outcome is not entirely predictable or controllable (ElMaraghy et al., 2019). It is also dynamic, resulting in constantly changing entities and their non-linear relations (Cilliers, 1998). Following the evolution of manufacturing systems’ complexity, the understanding of planning and operation must also evolve. The traditional expert-driven, sequential planning and operation is not sufficient anymore because a complex system is unpredictable and cannot be understood entirely (Morariu et al., 2020). Instead, a decentralized, dynamic, and adaptive approach that interacts with the system is required. In this regard, adaption becomes a complement to control during the operation of a manufacturing system.

Data acquisition

Data acquisition starts with identifying the relevant data. These data can be classified based on the structure. In principle, a distinction is made between structured, semi-structured, and unstructured data (VDI & VDE, 2020a). Structured data, such as process parameters or measurement results, has a predetermined format that fits into the data. Semi-structured data (e.g., workplans) follow a basic structure and include unstructured data (e.g., text or photos). In manufacturing, images can be acquired from visual quality inspections. (VDI & VDE, 2020a).

The identified data originates from single or multiple data sources. Based on the accessibility of these sources, data can be divided into private, club, and public data (Otto et al., 2016). In the manufacturing context, often private data can be found. Private data is owned by a company that can regulate data access, often limiting access to internal stakeholders. Club data is generally available to companies that jointly manage access. Club data is a result of inter-company cooperation throughout the value chain (e.g., supplier and original equipment manufacturer (OEM)) or in specific industrial ecosystems (e.g., OEM and start-up). Public data is available to every user and often provided by a public entity.

Independent of accessibility, data sources in manufacturing contexts are very heterogeneous and thus require specific interfaces, communication protocols, etc., for data gathering (Cui et al., 2020; Gröger et al., 2012; Ismail et al., 2019; VDI & VDE, 2020b). Generally, data is either operational or process data. Process data originates directly from the analyzed process and also includes inspection data. In contrast, operational data is derived from existing legacy IT systems such as Manufacturing Execution Systems (MES), Enterprise Resource Planning (ERP), or Product Lifecycle Management (PLM). (Cui et al., 2020; Kemper et al., 2010; VDI & VDE, 2020b).

Based on the information of the data source and the data format, data accessibility is reviewed and evaluated concerning intended usage in the cyber world. One option is to store data in files. This data can be acquired manually by uploading through graphical user interfaces (GUIs). Alternatively, an extraction, transformation, and loading process (ETL) can be used if the directory is defined and accessible. Data stored in software applications can be, e.g., accessed through the application’s database via queries. Continuously running applications (ERP, PLC, etc.) could have standardized communication protocols (e.g., open platform communication unified architecture (OPC UA), ModBus, PROFINET, PROFIBUS, etc.) that can be accessed and used to acquire data. Sensors might also have communication protocols or can be read out via analog or digital signals. More often, sensors are read out by small internet of things (IoT) devices and can constitute an entire IoT device network. (Filz et al., 2020a).

An essential aspect of data acquisition is the creation of a unique identifier (ID) for tracking and tracing. A unique identifier is assigned to each data point. From an engineering perspective, identifiers are required to ensure product tracking and tracing. From an IT perspective, identifiers facilitate efficient data management and ensure that the relevant data can be found and accessed for data analytics.

The choice of data management, if entirely needed, is made concerning the cyber world and the intended applications. Since IoT devices have become more durable and performant, they can also be used for data management, storing small amounts of data, either decentralized or on edge. Furthermore, the data can be stored centralized, either on-premise, if the IT support is guaranteed, or in the cloud. For data management, various technologies exist for on-premise or cloud data storage (e.g., Hadoop distributed file system (HDFS), MySQL, MongoDB, Amazon web services (AWS)). (Filz et al., 2020a).

Cyber world

The acquired data is deployed for data analytics use cases within the cyber world. Relevant techniques include e.g. data mining, simulation, hybrid modeling or visual analytics (Filz et al., 2020a).

Manufacturing analytics is defined as the holistic data analysis across all manufacturing processes to increase transparency and improve processes (Cviko & Böing, 2019). It aims to identify best practices, react quickly to events, and anticipate potential problems before they can affect product quality, yield, or cost (Halvorsen, 2006; Lade et al., 2017). Similar approaches target the application of data analytics in manufacturing, including “production analytics” (Schuh et al., 2019) and “manufacturing intelligence” (Davis et al., 2015).

Four maturity levels characterize data analytics and manufacturing analytics. These levels vary regarding the quality of information and knowledge and, thus, require human input. On the lowest level, descriptive analysis enhances the visibility of manufacturing systems by providing what happened. One step further, diagnostic analysis recognizes interrelations and interprets them to derive root causes. Hence, diagnostic analysis increases transparency. On the third level, predictive analysis learns patterns from historical data to predict future data, respectively, their probabilities. Prescriptive analysis minimizes human input and directly leads to decision-making (cf. Fig. 1). Depending on the achievable degree of autonomy, either decision support is provided to enable the human to make improved decisions, or the decision is entirely automated. (Linden et al., 2013; Schuh et al., 2020).

Data-driven approaches from the cyber world impact a wide range of manufacturing tasks. These areas include quality control, maintenance, root cause analysis, fault diagnostics, job shop scheduling, and manufacturing process optimization. All maturity levels are reflected in the application of data analysis in manufacturing. A lot of data-driven approaches target the quality domain. Quality tasks can be classified into the characterization of product and process quality, quality prediction, classification of quality, and parameter optimization (Köksal et al., 2011; Rostami et al., 2015).

Decision support

The results of the cyber world techniques are utilized for decision support for different stakeholders within a company (e.g., production engineers, shop floor workers) or for automatically controlling the technical systems. In both cases, the feedback loop of the CPPS can be closed. The goal of production engineering is to deploy the acquired knowledge and improve the manufacturing system. To reach this goal, human is required to remain in the loop. However, the extent to which the human is integrated ranges from active involvement in the described elements to a supervisory role through appropriate visualizations. The long-term goal is to develop fully-fledged CPPS that “possess high self-adaptation capabilities, including self-diagnosis, self-configuration, and self-optimization” (Bergs et al., 2020b).

Planning and operation of quality management

Quality directly affects a manufacturing system’s productivity, especially in mass production, where quality deviations lead to large amounts of scrap and rework. Quality is “the degree to which a set of inherent characteristics or features of an object fulfills requirements” (DIN En ISO, 2015). Within quality management, these inherent characteristics or features are set. It is based on four main elements or phases: quality planning, quality assurance, quality control, and quality improvement (DIN En ISO, 2015). Specific quality objectives are set within the quality planning, and operational processes that prove these quality objectives are put into place. However, quality assurance focuses on providing confidence that specific quality requirements, such as quality attributes, are fulfilled. The proof that those previously set quality requirements are met is given throughout the phases of quality control and improvements (DIN En ISO, 2015).

Quality control is often performed offline in manufacturing after a quality inspection (QI) completes an intermediate product. This can be achieved by measuring certain quality features defined within the quality requirements of this intermediate product, process parameters, or process requirements (e.g., process capability) (Wirtz et al., 1993). QI can be characterized and designed based on the inspection tasks performed. An inspection plan provides these tasks. Based on (VDI, VDE & DGQ, 1985), inspection plans include the following specifications:

  • Relevant inspection characteristics (what)

  • Inspection point (when)

  • Inspection type (how)

  • Inspection extend (how much)

  • Inspection place (where)

  • Inspection personnel (who)

  • Inspection equipment (whereby)

During manufacturing, process parameter variations and product feature deviations occur, leading to product-specific conditions. However, these QI specifications are usually defined before the start of production and are updated through continuous processes, such as optimization workshops. This causes a discontinuity between the planned specifications and the real conditions during production operation.

The concept of quality gates (QG) was introduced for the operation of quality inspections. The idea is to systematically divide a defined process chain into different quality-relevant decision points. Along these QG, the product’s perceived quality is inspected, securing that defined quality attributes are met before further progress (Wuest et al., 2014). The quality gates aim to detect deviations early so that effective countermeasures can be initiated in time (Schmitt & Pfeifer, 2015). A quality gate is a review in which the product characteristics are measured and compared with the requirements placed on the attributes during the inspection planning (Cooper, 2008; Prefi et al., 2014; Wildemann, 2010). However, the QG considers the past (through quality inspection) andhe future by predicting future product characteristics and including them in the decision (Stiller, 2015; Wildemann, 2010). This preview predicts the final product properties based on data analytics. If a predicted final property does not fulfill the requirements, the manufacturing cannot continue as initially planned. A possible countermeasure is adapting the following manufacturing processes to manufacture a conforming product. Generally, physical and virtual quality gates can be distinguished. In a physical QG, the quality inspection is performed based on physical principles, e.g., vision inspections, to characterize the intermediate or final product.

Additionally, recent advancements toward digitalization have made it possible to indirectly measure or infer the quality of a product during manufacturing processes using data-driven models, simulations, or soft sensors (Filz et al., 2020a). In a virtual QG (VQG), the QI is performed by analyzing data, e.g., process parameters, disturbance variables, or data of previous intermediate products. In this way, VQGs can improve inspection processes by reducing inspection times and increasing the accuracy and flexibility of the inspection technology through near-real-time processing (Filz et al., 2020a; Schnell & Reinhart, 2016). Thus, VQGs can enable fast and comprehensive quality estimations of parts, inline process control, process chain control, process parameter adaption, and advanced tracking and tracing (Filz et al., 2020a).

State of research

Platform approaches offer the opportunity to integrate multiple CPPS into a single IT entity. In this way, interfaces for data acquisition and means for visualization can be harmonized. Against this background, the current state of research on manufacturing analytics platforms and architectures is investigated and extended with a particular focus on quality management (Fig. 2).

Fig. 2
figure 2

PRISMA-chart of the structured literature review process based on (Lame, 2019)

Background for selection and evaluation of existing approaches

To review the current state of research on analytics platforms in manufacturing, a structured literature review (SLR) was conducted at the beginning of 2022. A structured literature review aims to identify, evaluate and interpret all available research concerning a specific research question (Kitchenham et al., 2009; Lame, 2019; Tranfield et al., 2003). The following search question is investigated in the present case: “What are existing manufacturing analytics platforms, and what are constituting elements?”.

The block building approach by Guba (2008) is utilized to derive search strings for a free text search. Three blocks were identified, referring to the target, action, and methodology of the reviewed research topic (c.f. Fig. 3).

Fig. 3
figure 3

Block building approach for search string identification

The target is the system of interest, namely the manufacturing system. Data analytics is the action that is performed regarding the target. The methodology facilitates the application of data analytics, which in this case is represented by the platform approach. Synonyms for the described block topics are considered.

Following a scoping study in SCOPUS database with 5,509 results, the search strings were limited to the most relevant ones (c.f. bold search strings in Fig. 3 lead to 2429 results). After that, subject areas were limited. Due to the interdisciplinary character of the analyzed field, a comprehensive search of potential subject areas was chosen. Considered subject areas include Computer Science, Engineering, Decision Sciences, Mathematics, Energy, Business, Management and Accounting, Materials Science, and Environmental Science. The resulting 1725 research findings form the literature base for this structured literature review. 16 additional publications are identified by other means, i.e., mostly found based on cross-references, own experience in the area, and within grey literature. These publications were added to the literature base.

For identifying the most relevant research, eligibility criteria were defined:

  • Content

  • Publications focusing on discrete manufacturing

  • Platform and its architecture constitute the core of the publication

  • Data analytics forms the main action of the developed platform

  • Publication language

  • English

  • German

  • Accessibility

  • Publication is accessible online

Publications are excluded from the SLR if focusing on generic reference models for platforms (such as RAMI 4.0 (DIN, 2016) or the Industrial Internet Reference Architecture (Lin et al., 2019)). An additional exclusion criterion concerns the scope of the developed platform approach: approaches that consider manufacturing as only one potential target out of several are to be excluded.

Using the above explained criteria, iterative filtering was conducted by reviewing the abstracts and screening and analyzing full texts (Moher et al., 2009). Ultimately, 21 publications are considered in the literature review and assessment. Out of these 21 publications, 16 were identified by the SLR, while five were among the added publications (c.f. Fig. 2).

Evaluation of existing approaches

The selected approaches are evaluated against a set of criteria. The criteria are clustered into scope, level of investigation, and methodological aspects. Figure 4 provides an overview of the current state of research on manufacturing analytics platforms (Al-Gumaei et al., 2019; Angione et al., 2019; Arantes et al., 2018; Bao et al., 2017; Beecks et al., 2018; Bousdekis et al., 2019; Gramegna et al., 2020; Gröger, 2015, 2018; Gröger et al., 2016; Gyulai et al., 2019; Illa & Padhi, 2018; Jun et al., 2019; Kassner et al., 2017; Liu & Jiang, 2016; O’Donovan et al., 2016; Sarnovsky et al., 2018; Tao et al., 2018; Wang & Luo, 2021; Wei et al., 2019; Woo et al., 2018). A heatmap is used for visualization. A red field within the heat map indicates that a criterion is fulfilled to no degree. In contrast, a green field means full criteria fulfillment. Partial fulfillment is also possible.

Fig. 4
figure 4

Heatmap evaluating the state of research on manufacturing analytics platforms

Figure 4 shows significant red spaces within evaluation categories and for single criteria. Notwithstanding, most criteria are (at least to some degree) accounted for by multiple approaches. The results of the category scope show that none of the approaches fulfills all criteria. Some approaches address the control of manufacturing systems. However, these do not address the planning of manufacturing systems simultaneously. An apparent research demand for an approach covering the planning and operation of manufacturing systems together can be identified. Moreover, almost no approach addresses quality management and the integration of humans in the proposed solution.

The evaluation results for the level of investigation clearly show that the analyzed approaches focus on one level instead of multiple levels simultaneously. This allows no or only very few analyses and modeling of interactions between the different levels to be considered.

The category focusing on the methodological aspects shows that some approaches consider isolated methodological aspects. However, none of the approaches comprehensively considers the various aspects. However, to answer various data analytics questions in manufacturing, the flexible use of one or more of these methods is needed.

Overall, no single approach fulfills all evaluation criteria to a sufficient degree. In summary, there is a research demand for an approach that covers all or at least most criteria in combination. Based on the evaluation and analysis of the literature research, the following detailed research demand was identified:

  1. D1.

    Development of a platform architecture that enables the application of solution modules with different scopes (e.g. quality management).

  2. D2.

    Development of a platform approach that supports both planning and operation of quality management in manufacturing systems.

  3. D3.

    Development of an approach that considers the manufacturing levels of product, process, and process chain.

  4. D4.

    While the consideration of manufacturing processes appears to be well established, product and process chain perspectives must be fostered towards holistic analytics processes that account for interactions and dependencies.

  5. D5.

    The concept shall integrate different data sources into one central database. This includes integrating state-of-the-art data acquisition and data management methods and enabling all four analytics maturity levels.

  6. D6.

    The concept should be applicable for multiple data analytics use cases in different manufacturing industries.

  7. D7.

    Development of a platform architecture that enables continuous model updates.

  8. D8.

    Platforms must incorporate a frontend that allows for human-machine interaction and comprises comprehensive visualizations.

  9. D9.

    There is a lack of continuously integrating humans into the platform for leveraging expertise and cognitive abilities.

  10. D10.

    There is a need to account for different user roles—e.g., shop floor workers, manufacturing engineers, quality engineers- and customize the user experience based on individual needs.

Development of manufacturing analytics platform

To detail the development of a manufacturing analytics platform, a general framework to structure its core elements and functionalities is proposed in subchapter 4.1. Thereon, a more detailed architecture is given in subchapter 4.2.

Manufacturing analytics platform framework

The current research state shows a need to develop an architecture for a manufacturing analytics platform. Therefore, a logical framework for such a manufacturing analytics platform is proposed in Fig. 5. The platform encompasses three horizontal layers: data management, modeling, and visualization. The platform is seen as a link between various business objectives and the physical manufacturing system. To address various challenges and solve existing problems in the manufacturing system, various solution modules (modules 1-n) are launched vertically on the platform, each specifically accessing the different layers and elements individually. The solution modules can be expanded if necessary.

Fig. 5
figure 5

Conceptual framework of a manufacturing analytics platform

The platform architecture is framed bottom-up by the manufacturing system encompassing production and management (Segreto et al., 2019). The holistic understanding of production is based on the different levels of investigation: product level, process level, and process chain level. Raw materials are transformed into final products in the manufacturing process to add value (Schenk et al., 2014; Westkämper, 2006). Process parameters define the processes. Multiple processes are connected by a material flow to form a process chain (Tönshoff & Denkena, 2011). During this gradual transformation, products between the initial and the final state are called “intermediate products” (Wengler, 1996). Both intermediate and final products can be characterized by their product features on the product level. Planning, adaption, and control concern the management of the manufacturing system. In this regard, planning relates to all procedures concerning the organizational processes and the physical design of the manufacturing system conducted in advance (Caggiano et al., 2019; D’Addona et al., 2019). In contrast, control and adaption target the manufacturing system’s lifecycle operational phase.

Control has a retrospective character by adjusting manufacturing after incidents and events. For example, the process parameters of a machine are adjusted after consistently producing defects, or inventory is refilled once reaching the safety stock. In contrast, adaption proactively adjusts plans based on changing circumstances and aligns them with real-time insights into the manufacturing system (in line with (Mourtzis & Vlachou, 2018; Niehues, 2016; Peruzzini & Pellicciari, 2017)). This variation of replanning is triggered by continuous monitoring, requiring data-driven approaches (Monostori et al., 2010; Mourtzis & Vlachou, 2018; Zhang et al., 2017). For example, a product is detected to be outside the tolerance limits. Then the process parameters of the subsequent manufacturing step could be adjusted, and the allocation of remaining quality inspections. However, the distinction between planning, control, and adaption, and management of manufacturing systems requires extensive interaction and feedback.

From the top down, the manufacturing analytics platform is subject to business objectives. The target system is traditionally composed of quality, cost, and time (Schmitt & Pfeifer, 2015). Additionally, environmental aspects are becoming increasingly important due to legislation, regulation, and customer demand (Hauschild et al., 2019a; Herrmann, 2010; Moltesen et al., 2019). In general, data analytics is not self-fulfilling. Instead, it must serve the business objectives and support a value proposition. Therefore, considering business objectives is essential for assessing data analytics results and derived modifications to the manufacturing system. Moreover, these objectives and the existing manufacturing system define the different solution modules launched based on the platform.

Data management as a first layer comprises “the practices, architectural techniques, and tools for achieving consistent access to and delivery of data” required for applications and business processes (Gartner, 2021). Thereby, it covers the fields of action of data scientists and data engineers. These two user groups are data experts. A dedicated environment is required because a heterogeneous landscape of data sources and data management techniques requires specialist knowledge (Beecks et al., 2018; Gröger, 2021). For data management, a single source of truth shall be constituted by storing and providing all relevant dataas suggested by e.g., (Schuh et al., 2020)). This data storage integrates a large amount of historical data and continuously expands it with real-time data. Based on this data, batch data analytics can be conducted to support use cases with analytics models.

The modeling layer supports manufacturing and quality engineers as data analytics practitioners. They combine extensive knowledge of manufacturing systems with a limited understanding and knowledge of data analytics. Against this background, these user groups are primed for translating data analytics results into modifications of the manufacturing system. Therefore, they are involved in building and training the analytics models. Thereby, they can build on the previous activities described for data management. In this way, the domain experts are enabled to build the models with limited to no user inputs.

The resulting model either acts as visual decision support or is to be deployed for real-time operation in the manufacturing system (Cui et al., 2020). In the case of decision support, the existing domain knowledge of the experts is successfully integrated into a data-driven approach. The results and impacts on the manufacturing system must be evaluated for both possible model outcomes. This could be achieved by integrating a simulation environment (as suggested by e.g., (Kibira et al., 2015; Qi & Tao, 2018)).

The visualization layer supports achieving user-centricity. Different user disciplines are primarily differentiated by their specific user roles. To this end, each discipline serves the needs of dedicated users, integrates their user stories, and reflects the underlying technical entities and processes. Additionally, the proximity to the manufacturing system is varied. This differentiation is derived from different latency requirements: operational tasks might require real-time processing. Therefore, different visualization solutions must be provided for different user groups. This can range from simple graphical plots to direct interaction with the user, e.g. through visual analytics.

However, it must be noted that the subdivision is carried out for structuring purposes only—no isolated subsystems are intended. Instead, all entities should be flexibly combined and follow a microservices approach. A microservice is a small, autonomous service that provides a single business function, is technically self-contained, and is deployed independently (Knoche & Hasselbring, 2019; Newman, 2015; Taibi et al., 2018). Each microservices is an independent development, deployment, versioning, scaling, and operation under a cross-functional, autonomous team (Jamshidi et al., 2018; Knoche & Hasselbring, 2019). Microservices aimo develop flexible, scalable, and modular information systems that can evolve over time (Bogner et al., 2019). In this context, microservices deal as solution modules to improve the manufacturing system.

Time-sensitive tasks such as the real-time operation of a data analytics model cannot be realized in a central entity, for instance, a cloud-based platform. Consequently, manufacturing system operation is proposed to be implemented on the edge level, i.e., in proximity to the entities of the manufacturing system (Zietsch et al., 2019). In this position, it acts as a real-time collector and real-time pre-processor of process data and a real-time executor of the developed analytics models. In this regard, the manufacturing system operation will primarily operate autonomously. Its predominant users, manufacturing engineers and shop floor workers, serve as data analytics consumers. In their work, they benefit from data-driven decisions and data-driven decision support. Besides consumption, they must also monitor and supervise the platform’s operation.

Platform architecture

Derived from the conceptual framework proposed in Sect. “Manufacturing analytics platform framework”, the conceptual architecture is presented in Fig. 6. The boxes depict the platform entities and processes, whereas the arrows represent their relations. Vertically, the architecture is divided into four layers: data acquisition layer, data storage layer, data analytics layer, and data visualization layer. Furthermore, data governance ranges across all layers. Horizontally, two main branches can be identified to allow hybrid data processing. The entities and processes are either assigned to a centrally managed infrastructure (e.g. a cloud environment) for batch data processing or an edge-level infrastructure for real-time data processing.

Fig. 6
figure 6

Conceptual backend architecture

The left branch comprises entities and processes of a centrally managed infrastructure. It allows batch data analytics based on historical data from various heterogeneous data sources acquired within the data acquisition layer. Typical sources can be user-generated data, such as lab protocols. Automated data ingestion processes enterprise data (e.g., PLM, ERP, SCM) or the manufacturing system has acquired data directly from the manufacturing system. This data may originate from machines (e.g., process parameters) or inspection stations (e.g., inspection results). This data can be available in real time. The communication between the shopfloor and data ingestion can be carried out by using a gateway.

For the development of the analytics models, the process of “Knowledge Discovery in Databases (KDD)” introduced by Fayyad et al. (1996) is followed. It comprises data selection, data preprocessing, data transformation, data mining, and interpretation (Fayyad et al., 1996). Currently, this can be achieved by assigning data-heavy aspects to data experts. Additionally, domain experts are responsible for model building and training using visual analytics where data experts support them. However, as a target vision, the entire model development process might be automated by an AutoML solution. Building on the analytics models, its implications for the manufacturing system can optionally be validated in a multi-level manufacturing simulation.

The right branch, in contrast, should be realized on an edge level. Hence, the platform entities are distributed close to the manufacturing entities, enabling the execution of real-time tasks. These real-time tasks range across all four layers: from automated, real-time data ingestion of machine and inspection data through real-time data preprocessing and real-time data analytics to visualizations of real-time data and results.

Connections between the branches are realized in two ways. First, data that is preprocessed in real-time is not only used for real-time data analytics but also ingested automatically into a data lake. Data lakes handle large volumes of data without changing the formats (Miloslavskaya & Tolstoy, 2016). Therefore, it is well suited for big data such as manufacturing process data or inspection data. In this way, the database is constantly increased, and the need for manual data ingestion is reduced once a connection is implemented. Secondly, the models for real-time data analytics are developed based on historical batch data but deployed on the edge level. In this regard, it is essential to define rules for the retraining and updating of analytics models to ensure high performance (Amini & Chang, 2020).

In addition, data governance spans across all layers. Data governance generally involves organizational structures treating data as an enterprise asset (Abraham et al., 2019). To maximize the value of this asset, data governance provides a framework for decision-making rights and responsibilities in terms of the use of data in an enterprise (Otto, 2011; Weber et al., 2009). The main aspects of data governance concern data quality, security, and privacy. High data quality is essential for success in data analytics approache(Gröger et al., 2016; Kotsiantis et al., 2007; Thoben et al., 2017). Data quality is vital across all layers. Data ingestion and preprocessing steps have a central role because data quality can be increased there (e.g., outlier detection or handling missing values).

The conceptual architecture forms the starting point for implementing the proposed manufacturing analytics platform. A microservices approach is well-suited for implementation as it represents the de facto industry standard. Following the microservices architecture paradigm, modularity, flexibility, and platform distribution are ensured. In this way, the novel concept represents a counterpoint to the existing, predominantly isolated applications. In this context, however, it should be noted that the architecture cannot be understood as complete but can be flexibly extended or adapted as required.

Case study: data-driven quality management in electronics production

This case study gives an architecture to track and analyze the product property propagation along the manufacturing system. Therefore, in subchapter 5.1, a description of the case study is performed. This is followed by the case study-specific design of the platform architecture in subchapter 5.2. Finally, exemplary modeling results are shown in subchapter 5.3. This case study is to be understood as an example of possible applications in the context of the platform. Further applications can be found in further publications. For more details, see (Filz et al., 2020a, 2020b, 2021).

Case study description

The use case covers the printed circuit board (PCB) assembly in electronics production. PCBs are produced in discrete manufacturing systems with multiple manufacturing processes and inline quality inspections (May & Spanos, 2006). The adopted process chain is shown in Fig. 7.

Fig. 7
figure 7

Process chain for PCB assembly (picture by (Kitchenham et al., 2009))

It is based on surface-mounted technology (SMT). Following this method, the electrical components are placed directly onto the surface of a PCB and soldered conductively in an oven. Depending on the PCB type, components might be soldered to both sides of the board. If this is the case, the process chain is passed through twice.

In the first step, the solder paste is applied to the surface of the unassembled PCB during the stencil printing process. This is followed by solder paste inspection (SPI), a visual inspection that checks the solder paste position. A defined amount of solder paste is required for a reliable connection of the components, and fluctuations in the solder paste volume must be minimized. Subsequently, the products are transferred to the pick & place (P&P) machine. In several successive steps, the P&P places the individual components (e.g. resistors, capacitors, microchips) on the liquid solder. Once all components are placed onto the PCB, it passes through a reflow oven. The solder is melted, and the components are soldered to the PCB. The automated optical inspection (AOI) finally checks the quality of soldered PCBs. It concerns the position of the components as well as the connection of solder pins.

A large production volume and a low takt time characterize PCB assembly. Therefore, the production processes and material flows are fully automated. Moreover, PCB assembly poses high-quality standards. Consequently, defect rates are low. This results in a small number of analyzable defect components and, thus, a high-class data imbalance. This data must be labeled according to the inspection results as either data of a conforming or non-conforming product. However, pseudo errors present a significant challenge: conforming products are regularly classified as non-conforming by automated quality inspections. Therefore, a manual re-classification of non-conforming products is often necessary.

It is necessary to consider the complete process chain instead of single and isolated processes to improve the overall quality and reduce scrap and rework. Therefore, it is essential to understand the dynamics and interdependencies within the manufacturing system. Knowledge about defects’ propagation and early detection is essential to derive proper QM strategies. Concerning data acquisition, two areas can be identified that pose particular challenges. First, each machine and inspection station records its respective data, which must be tracked automatically and stored in a central database. Secondly, attention must be given to the uniform labeling of data. Tracking and tracing individual PCBs and components is ensured during production, especially for product property propagation analysis.

Architecture for product property propagation analysis

Reiterating the current quality management situation, it must be noted that expert-driven improvements still play a significant role. The limits of human capabilities come with a lack of consideration of interactions within the manufacturing system and a lack of flexibilization of quality inspections. Moreover, quality is reviewed in the form of product and process conformity instead of being actively produced. Against this background, approaches have been developed that aim at data-driven planning, operation, and flexibilization of entire manufacturing systems. These approaches are referred to as “data-driven quality management”. This includes the introduction of virtual quality gates, the data-based characterization of products, and the derivation of inspection strategies. However, it thus far remains unclear how to implement and deploy these approaches in manufacturing systems. While current IT systems are predominantly isolated, rigid, and area-specific applications, it was elaborated that today’s complex and connected manufacturing requires holistic analytics applications. A manufacturing analytics platform follows exactly such a holistic analytics approach. Therefore, the generic platform architecture (c.f. Sect. “Development of manufacturing analytics platform”) is adapted to data-driven product property propagation analysis. The application of the architecture is within the use case of PCB assembly. The resulting architecture is shown in Fig. 8.

Fig. 8
figure 8

Backend architecture for the implementation of data-driven quality management

The architecture comprises the development of three types of data analytics (DA) models that support the product property propagation analysis:

  1. 1.

    DA models are developed for clustering similar PCBs at a given quality gate (e.g. SPI and AOI) into intermediate product classes (IPC).

  2. 2.

    The propagation of IPC affiliation across different quality gates to analysis the product property propagation.

  3. 3.

    A classification model enables a specific PCB’s assignment based on its real-time properties.

The development of the analytics models is subject to the central entity. It is based on historical data. This includes user-generated inspection data (from SPI and AOI) and operational data from enterprise systems. Additionally, the database is constantly extended by automatically adding further inspection data from the manufacturing system. Data from all the described sources is ingested into one database. For effective data management, the assignment of IDs for identification is an essential step. Regarding the described use case of PCB assembly, new IDs must be created. Each panel of four PCBs has a distinct barcode in the physical assembly process. Each PCB, however, consists of multiple components that each have multiple pins. Therefore, defining IDs for each PCB on a panel is necessary, e.g. by appending a position code to the barcode. The required data is extracted from the database in batches for model development. Here, domain knowledge might be necessary to assess data validity, e.g., because of technical changes in the PCBs or the manufacturing system. As described above, three models are considered. The development of these models could be achieved following a variety of methods.

In the first step, the models for identifying IPCs at a specific quality gate (DA model 1 for e.g. SPI and AOI) and their propagation across different quality gates (DA model 2) must be developed. They enable the domain experts, i.e., quality engineers and manufacturing engineers, to plan product phenotypes (deployment I) and adaptive inspection strategies (deployment II). Thereby, analytics models 1 and 2 provide decision support. Visualizations appear to be well suited to convey this decision support. For example, the propagation of product properties can be visualized using a Sankey diagram. Moreover, the DA models themselves can be explored by the users. In general, the planning of the domain experts leads to modifications of the manufacturing system. Therefore, these modifications should be evaluated to ensure effectiveness. Evaluation options can be pursued here, e.g., data-driven, probability-based validation.

Furthermore, a model for classifying PCB phenotypes is developed (DA model 3). In contrast, this model must allow real-time execution and deployment on the edge-level entities for manufacturing adaption (deployment III). However, domain experts are still involved in exploring and evaluating the model, e.g., via visualization of the model.

The deployment of the classification model is carried out on an edge device. It uses real-time streaming data from the machines and quality inspections to classify PCBs automatically as phenotypes. In this way, an adaption of the corresponding inspection strategy is performed.

A configurable and web-based dashboard provides manufacturing engineers and operators with the required insights for monitoring and supervising the process.

The web application is developed in the Anaconda 3 distribution of Python 3.9. The frontend was built with the Streamlit 1.10.0 framework shown in Fig. 9. It consists of two areas. On the left-hand side, the module selection area allows the user to choose between different modules related to the data analytics and visualization layers in Fig. 8. The interaction and visualization area enables user interactions with and visualization of the data and results.

Fig. 9
figure 9

Frontend of the web application for data-driven quality management

The web application dashboard can be used by different stakeholders, such as data scientists and engineers, process experts, and operators, because the functionality ranges from data analytics-related model deployment to process decision support. Data engineers can define the model structure based on the underlying data acquisition and preprocessing. Process experts can use this model structure in different analytics modules to extract cross-process-interrelationships. Additionally, operators can use the model for direct decision support on the shop floor and for predicting later product features during production.

Data governance is essential across all steps described above. Ensuring high data quality is essential. This is because high-quality data is both the foundation and prerequisite for high-quality data analytics and hence also for high-quality planning and operation of the PCB assembly. Moreover, data security and privacy must be guaranteed.

Cluster-based product property propagation analysis

To improve the overall quality and reduce scrap and rework it is necessary to consider the overall process chain instead of single and isolated processes. Hence, understanding dynamics and interdependencies within the manufacturing system is essential. To derive proper QM measurements, knowledge and transparency about the propagation and the detection of intermediate product features (IPF) changes along the process chain, intermediate product states (IPS), intermediate product classes (IPC), and product phenotypes is essential. Figure 10 gives a graphical overview of these terms in the context of a manufacturing system.

Fig. 10
figure 10

a Characterization of intermediate product states and b product phenotypes within a manufacturing system (May & Spanos, 2006)

An IPF is a quantitative (e.g., temperature) or qualitative (e.g., material composition) description of the product and definable and deterministic measurements. An IPS combines IPFs that characterize an intermediate product at a specific observation point along the manufacturing process chain. (Filz et al., 2020b; Wuest, 2015; Wuest et al., 2014) To analyze the propagation of different products along the manufacturing process chain, IPFs are clustered to define IPCs (see Fig. 10a).

A data-driven analytics approach is chosen to identify IPC using unsupervised machine learning. This approach enables decision support by describing the propagation and analyzing interdependencies of the IPC within the manufacturing process chain. Based on this, the behavior of different products within a manufacturing system can be tracked, and specific control strategies can be derived based on product phenotypes. The implementation within the web application representing the clustering results of different IPC is shown in Fig. 11.

Fig. 11
figure 11

IPC definition within the developed web application

The concept of product phenotype has been introduced to track and trace the propagation of intermediate products along the manufacturing system. Product phenotypes are defined by a combination of different IPC along the process chain that describes the characteristics of a particular product within the manufacturing system (see Fig. 10b). (Filz et al., 2021).

To gain insight into the characteristics of IPCs within the process chain, the propagation from the AOI and SPI inspection target to the final product quality can be visualized in the interaction and visualization area of the developed dashboard (Fig. 12).

Fig. 12
figure 12

Visualization of product phenotype propagation within a Sankey diagram in the web application

The resulting Sankey diagram can be used in different ways. For example, a shop floor worker can, e.g., use visualization to identify defective parts that can be removed from the manufacturing system. Moreover, quality engineers can use visualization to analyze and understand the interdependencies of process–product relationships along the process chain to improve the overall performance of the manufacturing system.

The visualization shows one dominant product propagation from “SPI-bot: 0”, “AOI-bot: 1”, “SPI-top: 1”, and “AOI-top: 2” to a “good” final product quality. Based on these propagation results, different QM strategies can be derived. For example, all products following the highlighted path will have a high chance of being good products. Hence, the inspection strategy of these products could be adjusted by lower the number of inspections to reduce the overall inspection effort.

Conclusion

A novel approach for a manufacturing analytics platform was proposed to overcome the described research gap. It combines state-of-the-art concepts of IT with procedures for the planning and operating quality management measurements in multi-stage manufacturing systems. The definition of concept objectives described the underlying vision, and the derived concept requirements guided the development process. Thereon, a conceptual framework was introduced. It integrates the platform into business objectives and the manufacturing system.

Additionally, the platform can integrate different solution modules to support manufacturing systems’ overall quality management and improvement process. Here, the focus lies mainly on the requirements to track and trace products along the manufacturing process chain. Finally, the technical architecture for implementing the manufacturing analytics platform was developed. Vertically, it consists of four layers: data acquisition layer, data storage layer, data analytics layer, and data visualization layer. Moreover, data governance ranges across all layers. Horizontally, two main branches can be identified to allow hybrid data processing. The entities and processes are either assigned to a centrally managed infrastructure (e.g., a cloud environment) for batch data processing or an edge-level infrastructure for real-time data processing.

The concept is distinguished from existing approaches by the following characteristics: Considering an edge-level implementation to complement a central entity enables real-time capabilities. Moreover, users are continuously integrated into the processes, leading to a human-centric approach. Finally, the approach emphasizes deploying generated knowledge about the manufacturing system. This is achieved by integrating the approach into the existing processes for managing manufacturing systems. Here, data-driven planning and control of quality management is considered. Additionally, a multi-level manufacturing simulation can be integrated to evaluate findings. Applying this to a case study on electronics production has demonstrated that the developed platform concept and the corresponding architecture allow for different analytics tasks shown based on an analysis of product property propagation along the manufacturing process chain.

The provided architecture significantly enhances traditional quality management towards flexible, iterative, and data-driven quality management.

Outlook

As outlined in the previous sections, significant contributions could be made compared to the state of research. Nevertheless, there are diverse opportunities for future research to extend the contributions of the present work:

Application to other industrial sectors and use cases

The concept is tailored to the needs of discrete manufacturing in multiple stages. To this end, it was applied to quality planning and operation in PCB assembly. Future research should transfer the application to other industrial sectors. An extension towards other analytics use cases, such as process optimization or predictive maintenance, is also necessary. For comparability, the variation of either the industrial sector or the use case is suggested.

Quantitative concept validation

The developed concept of a manufacturing analytics platform was validated qualitatively in the present work. There is a need to extend validation quantitatively by developing a platform prototype. To this end, a user interface to facilitate model development and visualizations is particularly interesting.

Implementation with commercial software frameworks

Despite the need for a prototypical implementation in general, an implementation based on commercial big data frameworks could be investigated. Such frameworks are e.g., Apache Hadoop, Apache Spark, and Storm (Govil, 2019). This way, it can focus more on integrating IT into manufacturing engineering and processes for planning and operation. Furthermore, the scaling of the platform approach from prototype to commercial application appears to be simplified.

Integration of further solution modules

The developed concept proposes the integration of different solution modules. Potential solution modules must be identified, implemented, and validated in future work. In this context, exploiting the platform’s advantages using data along the entire process chain is essential. Here, the implementation of VQG bears excellent potential to improve quality inspection strategies or adopt process parameters to “save” products.

Integration of environmental evaluation towards life cycle assessment (LCA)

Data-driven approaches can contribute toward sustainability in manufacturing. In this regard, the importance of environmental sustainability is ever-increasing. Consequently, environmental KPIs are considered in the business objectives of the developed framework. Hence, there is a need to integrate Life Cycle Assessment for evaluation against environmental impact categories (Hauschild et al., 2019b).