1 Introduction and motivation

Coined by Cunningham [1] in the classical software engineering domain, Technical Debt (TD) describes a context in which a technical compromise is taken, e.g., delivering low-quality code to meet a short deadline. The technical shortcut can provide a short-term benefit but may cause a long-term negative impact on the system quality or the productivity of engineers [2]. The TD concept can be transferred to the engineering of automated Production Systems (aPS), which are widely used nowadays to assemble products in various fields such as automated packaging, pharmaceutical production or food processing. aPS are a subclass of mechatronic systems. Typically, aPS are developed by engineers from multiple disciplines such as mechanical, electrical or software. Other fields such as hydraulic, pneumatic, sensor, process or drive technology may also be involved in aPS development. There are strong interdependencies between all those disciplines. Analysing TD in the software discipline at one aPS company, Besker et al. [3] suggested further research should study mechanical and electrical fields in addition to software.

During aPS development, technical decisions from various disciplines and phases are validated and described in several documents such as product description, operation manual, risk analysis documentation or technical product requirements. In this work, a meta-analysis is performed to address engineering TD in the aPS domain. The study considers the documents conducted during the development of aPS in the healthcare sector following the V‑Model [4] and GAMP 5 [5] (i.e. Good Automated Manufacturing Practice). The examined documents include (1) functional specifications and design specifications from different disciplines (e.g. hardware, software, electrical/electronics) following the V‑Model, and (2) risk trace matrix and corresponding internal guidelines following GAMP 5.

While TD in the classical software engineering domain can be identified with code analysis, the work on TD in the aPS domain is still limited [6]. Recent work on TD identification in the aPS domain still focuses on software discipline [7]. Thus, a systematic method is lacking to identify TD in other fields in the aPS domain. This work is the first qualitative study on multi-disciplinary engineering documents to study the state-of-the-practice risk analysis to identify and classify engineering TD in the aPS domain. The main contributions of this paper are: (1) a systematic TD identification approach on engineering documents in the aPS domain and (2) an enlargement of the TD classification with TD sub-types from validation of process control systems’ engineering. The work thus yields a sound basis for TD management to improve overall equipment effectiveness (OEE) in the aPS domain as well as enlarges the body of knowledge on TD.

The remainder of the paper is structured as follows: In the next section, state of the art on validation of process control systems of automated manufacturing systems and TD in aPS are outlined. Subsequently, research questions are derived. In Sect. 3, a meta-analysis approach on engineering documents to identify TD is described. The results are presented in Sect. 4. The paper concludes with a summary and an outlook on future works.

2 State of the art in production automation TD and validation to identify open research questions

This section presents an overview on the validation of process control systems of automated manufacturing systems, followed by an outline of TD related work, which derives research questions in detail.

2.1 Validation of process control systems of automated manufacturing systems

In the area of qualification for automation technology, the companies have to follow specific guidelines to meet requirements from the authorities such as Food and Drug Administration (FDA) or European Medicines Agency [8,9,10]. Good Manufacturing Practices (GMP) should be applied to ensure safety and traceability to comply with the increasingly strict regulations [11]. Alam [12] indicated that “pharmaceutical Process Validation is the most important and recognised parameters of cGMP” (Current Good Manufacturing Practices) from the FDA. Glennon [13] reported that in the validation field of computer control systems, “besides factory acceptance tests, identification of potential problems minimises the risk of costly field corrections”.

Smith et al. [14] proposed a math model to support risk analysis on the systems. Chowdary et al. [15] tried to deploy lean tools together with cGMP principles. Examining system life cycle, risk management and system testing, Samson et al. [16] indicated that a life-cycle approach would best suit process systems engineering validation. There are some industrial guidelines such as NAMUR recommendations and worksheets [17] or GAMP. The current generally accepted policy for the validation of computer-aided systems is GAMP 5 (the last version of GAMP), according to [18].

GAMP 5 is a risk-based life-cycle approach to making industry compliant computerised systems; thus, aPS manufacturers can acquire the certificate to market the manufacturing systems. GAMP 5 provides the critical fundamental concept to identify and report risks throughout the life-cycle of systems. For instance, the severity of a fault’s impact on product quality is offset against the fault occurrence probability. The calculation determines the risk classes from I to III. The risk class is offset against the likelihood that a fault is identified before damage is caused and results in risk priority categories from Low, Medium to High. The detection or prevention methods are designed to achieve technical improvements and reduce risk. Consequently, control measures and qualification tests are defined as part of the completed assessment in accordance with company-specific rules. GAMP 5 recommends two rounds of risk priority evaluations for each risk reported, and the risk assessment should be carried out for all changes. Based on GAMP, Neuhaus et al. [19] defined validation procedures for the process control system of a large scale manufacturing plant.

In pharma and food & beverage, Feigenbaum et al. [20] introduced methods to control food migration into GMP. Meyliana et al. [21] proposed a blockchain technique to minimise the circulation of counterfeit drugs. Huysentruyt et al. [22] presented an application of GAMP 5 for AI-based systems. In recent work, Andriani et al. [23] reported a case study of a GAMP application to analyse and prioritise risks in the production process. Suseno et al. [24] emphasised knowledge sharing and GMP performance. Poor exchange of know-how could be a challenge during the implementation of the MES (Manufacturing Execution System) in the food and beverage industry, according to Chen et al. [11].

The risks documented by the GAMP 5 risk-based approach are potential sources for quality improvement. Specifically, the reported risks might reveal technical compromises taken along the engineering process; the discipline initiates the compromises and impacts on the related fields. Examining the compromises could identify new know-how that could benefit aPS manufacturers and the customers, as TD might impact system quality [2]. Furthermore, a cross-disciplinary TD study is necessary [3]. The following section reviews related work on TD in the automaton domain and adjacent research fields to identify the research gap.

2.2 Technical debt

In the classical software engineering domain, Li et al. [2] established a TD classification according to the phases of the software development process. The proposed classification tree consisted of 10 TD types, such as Requirements TD, Architectural TD or Code TD. Each TD type was then further categorised into sub-types based on the causes of TD. Among the TD types, the most studied TD was Code TD [2]. Avgeriou et al. [25] indicate that TD always relates to cost. Unexpected significant cost overruns in software development projects could occur if failure to monitor TD [26]. TD could have an impact on the product delivery date [27,28,29] and the organisation’s profitability [30,31,32]. However, the engineers sometimes suggest a TD removal plan, but the management is often unwilling to approve the improvement due to the business value focus [33]. Additionally, the staff choosing to initiate TD (e.g., designers) could be different from those dealing with TD (e.g., maintenance engineers) [34, 35].

In the embedded software engineering domain, which is nearer to the aPS domain, Martini et al. [36] examined architecture documentation to study TD. However, the focus was on Architectural TD in software only.

In the aPS domain, Besker et al. [3] analysed the work of software engineers at one aPS company in Scandinavia. They reported that the engineers spend significant extra effort due to TD. The reported effort “on average 32% of the development time” on TD was confirmed by managers. The work implies that more investigation should be conducted to understand TD at other disciplines of aPS, such as mechanical or electrical. Vogel-Heuser et al. [37] conducted case studies regarding fault handling in companies developing aPS to increase OEE. However, the work focused on software discipline only. A survey conducted by Dong et al. [6] with the machine and plant manufacturers indicated that TD might introduce unscheduled costs between disciplines and phases. However, TD awareness at these companies is still low. Vogel-Heuser et al. [38] analysed the interdisciplinary effects of TD in companies working with mechatronic products. They found that TD “emerges most frequently in the first three stages of the life cycle” (i.e. Requirements, Architectural and Design). The work was based on the interview method, and examining engineering documents was not in focus.

Based on Li et al. [2], Vogel-Heuser and Rösch [39] introduced an initial enlargement of TD classification for aPS linked to additional phases, such as manufacturing or commissioning. Although the disciplinary aspects were considered, this initial enlargement focused only on introducing some coarse-grained TD types (e.g. Manufacturing TD or Commissioning TD). Capitán and Vogel-Heuser [40] presented and classified some TD sub-types in aPS; however, the focus was on the software discipline. Waltersdorfer et al. [41] and Biffl et al. [42] reported some TD sub-types related to the data exchange process between the disciplines, such as data models or knowledge representation (Documentation TD). However, the classification of TD was not explicitly considered.

Extensions are necessary to improve the TD classification for aPS. First, the initial enlargement in [39] focused on TD coarse-grained types. No concrete TD sub-type is presented in, e.g., Manufacturing TD or Commissioning TD. Thus, the causes of TD in additional phases of aPS should be explored to establish new TD sub-types for those TD coarse-grained types. Some specific phases of aPS might last up to decades, e.g. operation and maintenance time [34]. Therefore, exploring TD in these phases would help avoid high costs from TD. Second, the research on disciplinary aspects of TD was still limited, and the focus was mainly on the Code TD type. Therefore, particular boundary conditions during the development of aPS are not yet sufficiently considered, such as interwoven disciplines with different priorities or target conflicts. Thus, assessing the TD considering these conditions is necessary. Therefore, more TD sub-types related to multiple disciplines should be studied to enhance the classification of other existing TD types defined by Li et al. [2].

To the best of the authors’ knowledge, no other studies have been undertaken to explore the aPS engineering documents to identify TD in different disciplines of the aPS domain. Thus, this work analyses documents of aPS companies to study engineering TD (RQ1). Once identified, the new detected TD are systematically classified (RQ2), and the extent of benefit with industrial experts should be evaluated (RQ3). Thus, this work aims to fill the research gap to contribute to a potential increase in OEE in the aPS domain.

2.3 Research questions

Based on the evaluation of relevant work in the previous section, three requestion questions are formulated as follows:

  • RQ1: How to analyse documents to study engineering TD?

  • RQ2: How are new detected TD systematically classified?

  • RQ3: To what extent are the reported TD cases beneficial in the work of industrial experts?

Following the idea of Li et al. [2] in the classical software engineering domain, a study should examine typical settings in which TD occurs and which constraints, situations, and individuals or disciplines are involved. As outlined in Sect. 2.1, in some fields such as pharmaceutical production or food processing, aPS need to meet some predefined requirements, which are often validated using GAMP. Following the GAMP principle, quality must be validated at each engineering stage; thus, the study should examine not only decisions or risks documented from engineering phases but also the documents from early/preparation phases. To be precise, engineering TD could be determined by analysing design specifications (RQ1.1), as the design is one of the most frequent stages in which TD emerges [38]. Further, the risk trace matrix documents centralised not only the identified risks in the project phases but also the risks during the entire operation; thus, those recorded and traceable risks could be a potential source to identify engineering TD (RQ1.2). There, the identified TD might relate to specific engineering phases (RQ2.1) or may link to cross-engineering stages (RQ2.2), as aPS changes or evolutions are multi-stage and multi-disciplinary [3, 6, 38]. Furthermore, an evaluation with industrial experts is necessary to confirm the identified TD types and the identified use cases. The reported TD cases could provide a soundness proof concerning engineering risks (RQ3.1). In addition, the usefulness of reported TD cases could be perceived as a potential improvement for system quality as well as risk management process since uncovered TD might pose risks (RQ3.2).

The sub-research questions are formulated and presented in Table 1.

Table 1 Summary of the results addressing the three Research Questions

3 Meta-analysis approach on engineering documents

In the beginning, a meta-analysis was conducted with engineering documents provided by a world-leading machine and plant manufacturer, which is following the V‑Model and GAMP 5. The functional specifications follows the V‑Model process, and the risk evaluation is in accordance with GAMP 5. The compositions describe engineering and risk in electrical, software, and mechanical disciplines. The risk trace matrix document also includes feedback from the customer of the participated company.

The solution mentioned in each risk entry was assessed to check whether there was a suboptimal solution; thus, TD could be detected. Two workflows with different approaches were designed for this study to achieve a TD classification. The two methods supported the researchers with systematically reviewing and identifying TD in significant long documents (more than a hundred pages). On the one hand, the bottom-up approach (cp. Figure 1a) focused on discrimination between implementation and guidelines/standards. On the other hand, the top-down approach (cp. Figure 1b) analysed risk entries.

Fig. 1
figure 1

Meta-analysis approaches on engineering document to identify Technical Debt—a bottom-up approach, b top-down approach

This analysis design could also be reused in other studies which aim at a similar goal.

Regarding approach 1 (cp. Figure 1a), the first step was the identification of implemented guidelines in specifications of different fields (i.e. functional, hardware, software, electrical/electronics). Some examples of identified aspects were

  • the hardware classification followed GAMP 5

  • the creation of the software design specification is based on GAMP 5

  • The emergency stop and safeguards have been designed and manufactured in compliance with the Directive on Machinery (EC 2006/42)

In parallel, an exploration of potential guidelines on which specifications should be followed was conducted to identify insufficient conformity to the standards or best industry practices. Aspects with no procedure mentioned were studied. This step is an iteration process. Some examples of identified aspects were

  • HMI (Visual Displays of Machine Status) did not mention whether specific guidelines or techniques were followed. Some potential candidates are

    • VDI/VDE 3699 Process control using display screens

    • OMAC [43] stack light concept

  • Operating Modes did not mention any guidelines or standards followed. The potential candidate is

    • OMAC/ISA PackML

The second step was TD identification by studying the identified guidelines and comparing them with the implementations. Also, by examining the “Cause of Malfunction” and “Detection/Prevention” of each risk entry in the risk trace matrix, the documented solution was checked if there was any “sub-optimal” or “quick-fix” which often initiates TD, according to Cunningham [1]. Specifically, the analysis considered some aspects identified in Lueddemann [44] as a potential source of TD. For instance, inadequate reciprocity among hazards, faults, risk control measures, design, implementations, and tests. Other examples are low traceability of residual risks, redundant data sets within technical documentation, or inconsistent use of unique data identifiers across projects. The second step was an iterative process. Consequently, the detected TD items were added to the well-known TD classification tree in Li et al. [2]. The detected TD items were further classified following the detailed activities related to the risk assessment process at different phases in Hegde et al. [45]. For instance, some risk related TD items initiated during the design phase were classified into a new sub-type TD named Risk analysis TD within Design TD. Another example is some risk related TD items were organised into a new sub-type named Risk assessment TD within Documentation TD.

In the third step, a snowballing was performed. Recent work and aspects relating to the identified TD (in specifications) were explored to search for potential guidelines. The second step was an iterative process. If more aspects were detected, the process would come back to the second step. Otherwise, the process reached the final goal.

Regarding approach 2 (cp. Figure 1b), entries in a risk trace matrix were identified in the first step. This step contained two sub-steps: (1) risk aspects were studied, and (2) implemented or potential guidelines were explored. An example of results was “Guard locking does not lock”, and a possible procedure was OMAC. This step was an iterative process.

In the second step, TD was identified by two sub-steps: (1) studying guidelines and (2) comparing with implementation. The mentioned guidelines here are specifications from the industrial partner. This step was also an iterative process.

Similar to the bottom-up approach, in the third step, a snowballing was performed. There, related aspects of identified TD (in specifications) were studied to search for potential guidelines. Some related work to the identified aspects was also explored. The second step was an iterative process. If more aspects were detected, the process returned to the second step. Otherwise, the process reached the final goal.

In the following, some examples of the TD identification step are presented, and threats to validity are reported.

3.1 Examples of TD identification step

This section presents the rationale of TD identification. Three selected examples are used to illustrate how and why a case of TD was detected.

Regarding the first example, there was a discrepancy between risk assessment guidelines and practice (Documentation TD). In the internal risk assessment guideline, the company defined definitions and points to each function in the Severity, Probability and Detection categories. The points were then multiplied to determine risk classes and risk priorities. For example, 1, 3, and 5 points would be given at Severity rating; 2 and 4 evaluation levels would not be used. However, in practice, Severity ratings with 4 points still appeared in the risk trace matrix.

In the second example, there was a discrepancy between practice and GAMP 5 (Documentation TD). The internal risk analysis document noted that the risk evaluation process would be in accordance with GAMP 5. Five evaluation levels were defined for detection and probability criteria. However, only three evaluation levels were recommended by the GAMP 5 guideline (see Appendix M3 of GAMP 5 [5]).

Regarding example 3. there was an intentional judging of the risk as a low priority although the risk is high and other options available (e.g. software bug) (Documentation TD). In this use case, according to the sequential entry numbers, three identical malfunctions occurred at a safety device. The safety function was triggered due to operator intervention and software bugs. For each risk entry, function tests were defined, and the priorities were lowered from High/Median to Low. Below these three risk entries in the risk trace matrix, the fourth risk entry noted that a related malfunction occurred (e.g. safety function was not triggered), according to the sequence number. The impact documented in the fourth one was significant (e.g. operator safety might be impacted). However, in the first evaluation, the priority of the fourth entry was ranked at Low; thus, no second evaluation was performed, according to the internal risk assessment guideline. Therefore, a potential TD was identified from this use case.

3.2 Threats to validity

First, there might be a bias when deriving the (sub-)research questions. The research questions follow the ideas and suggestions from recent publications to reduce this bias.

Second, the engineering documents were provided by an industrial partner. Some information was missing in the provided risk trace matrix document. For example, a risk entry reported “damage to product”, but no “Detection/Prevention” was defined. In addition, there was no second assessment for this risk, although the first assessment rated the risk as high. Thus, there is a threat that if the industrial partner did not offer complete or accurate documents, the authors would not be able to figure out all the missing information in the provided documents. To compensate this threat and to improve the reliability of the results presented in the paper, an evaluation was conducted with two experts from the industrial partner to clarify open questions and missing information. For the concrete example, the second risk assessment round has not yet been completed and the company has started an internal review on how to compensate the increased risk. The reliability of the risk assessment is improved as the information contained in the risk trace matrix is confirmed as correct and could be contextualized with the help of the expert assessments. The incomplete information is classified as “Lack of evaluation for identified risk” TD sub-type in Risk assessment TD. However, the accuracy of information still needs to be reviewed in future evaluations due to a significant difference between the two assessment rounds. In addition, some risks are assessed as low, although the risk impacts seem significant. For instance, a risk reporting “Loss of safety for the operator” was rated low.

Third, during the analysis phase, the data (i.e. engineering documents) were analysed independently by the first author, then discussed with the second author to reduce data interpretation errors.

Fourth, the analysed engineering documents were from an industry partner. There is a threat regarding the transferability of results, i.e. the results are also valid for other companies. In healthcare or medical technology, following GAMP is a straightforward way for manufacturers to be certified by sophisticated regulators such as FDA since the standardised procedures of GAMP support ensure product quality or maintain efficiency in operations. Thus, it can be assumed that other companies in the same sector with requirements to certify their machines must apply similar workflows to document conformance according to GAMP. Therefore, those companies might face similar challenges to the industry partner. In addition, recent publications have reported the impacts of different TD on the development and maintenance of aPS. For instance, [46] reported TD related to work instructions at an electrical/electronics manufacturer. The quantitative findings in [47] revealed some potential triggers of TD related to the documentation at machine or plant manufacturers. Thus, a decent degree of generalisation could be achieved.

4 Overview of the analysis results

This section first reports an overview of the analysis results, followed by the typical TD use cases at aPS manufacturers. Following the idea of Li et al. [2], identified TD items from the use cases are classified into the phases in life cycles of aPS. Second, three use cases are selected and presented. Since the applied research method was a meta-analysis, the reported TD is mainly related to Documentation TD. Finally, an example of the delta between first and second risk evaluations in the GAMP 5 process is presented.

4.1 An enlargement of technical debt classification for validation of process control systems engineering

Based on the introduced meta-analysis method, an enlargement of TD classification in aPS manufacturing is achieved. There are 26 TD sub-types derived and generalised from 21 considered use cases. The TD sub-types are subsequently classified into 4 TD types. 25 of the 26 TD sub-types are new. The sub-types of TD are organised and presented in Fig. 2. The TD sub-types were named based on the causes of TD, according to the guidelines of Li et al. [2]. The TD sub-types relating to the risk evaluation process are grouped into: (1) Risk identification TD and Risk analysis TD within Design TD, and (2) Risk assessment TD within Documentation TD (illustrated with thick border boxes in Fig. 2). It could be observed that most new TD sub-types belong to Design TD and Documentation TD since the work was a meta-analysis on engineering documents. As some identified TD relate to a specific engineering phase such as design or manufacturing, RQ2.1 is addressed.

Fig. 2
figure 2

Enlargement of Technical Debt classification for validation of process control systems’ engineering

4.2 Examples of TD use cases

Due to confidentiality, the gained results are presented using two lab demonstrators. Firstly, the section introduces both demonstrators; the extended Pick and Place Unit (xPPU) [48], a community research demonstrator, and MyJoghurt [49] demonstrators. Secondly, the section describes three examples of TD use cases in detail.

4.2.1 Introduction of xPPU and MyJoghurt demonstrators

The “extended Pick and Place Unit” (xPPU) represents a lab-size demonstrator in the SPP 1593 programme for production automation [48]. The xPPU consists of a stack providing the workpieces (WP), a crane to transport the WPs, a stamping unit, conveyors and ejectors to sort the different WP types into slides [48]. This paper focuses on the Linear Handling Module and its function Resorting Workpieces (cp. Figure 3).

Fig. 3
figure 3

xPPU demonstrator [48]—Linear Handling Module—Resorting Workpieces

MyJoghurt [49] is a laboratory Cyber-Physical Production System. The MyJoghurt consists of multiple modules: (1) a conveyor module for bottle transportation, (2) a handling robot for handling and commissioning, (3) preparation or tank control process modules, and (4) two filling stations with storage modules and separators [49]. This paper focuses on the bottle separation feature of the conveyer module (cp. Figure 4).

Fig. 4
figure 4

MyJoghurt demonstrator [49]—a overview, b bottle separation feature

4.2.2 Example 1—Specifications versus standards

A comparison between design specifications and standards/industrial directives showed some discrepancies between requirements and standards (Documentation TD). For instance, the design specification used yellowish-green to represent the normal operating state (e.g. on HMI). However, according to VDI/VDE 3699, the yellowish-green is reserved for the prewarning state. In a follow-up discussion, industrial partners informed that they were aware of the reported issues, but the company could not resolve these discrepancies due to specific customers’ wishes. In this case, the customer has forced the company to disobey standards. The partners also raised concerns that significant efforts may be required to correct or maintain the inconsistencies in different systems sold to other customers if deviations are large. As an examination of design specifications successfully identifies TD and confirmed by related experts, this example is proof of answer for RQ1.1.

4.2.3 Example 2—Workpiece separation failure

The risk trace matrix document reported a sub-optimal manufacturing process due to insufficient sensor adjustments. A method is considered sub-optimal if certain desired qualities are not satisfied. For example, some NOK products could be assessed as OK ones due to insufficient instructions for (sensor) adjustment in operation (Documentation TD). Because of incorrect adjustments, the sensor delivers the wrong signal; thus, the NOK object is detected as OK (cp. Figure 5a). Consequently, the NOK object is separated into the wrong lane (cp. Figure 5b).

Fig. 5
figure 5

Workpiece separation failure—a sensor delivering the wrong signal, b incorrect workpiece separation

It is worth noting that the “incorrect adjustment” incidents appeared several times in the reported document. Due to “(sensor) adjustment” Documentation TD, some additional efforts might be required at other project phases such as assembly or operation. Therefore, the plant was operating at a sub-optimal level. As an analysis on the risk trace matrix document could support in identifying TD confirmed by related experts, RQ1.2 is addressed.

This use case suggests further analysis should be performed regarding “(sensor) adjustment” to assist with risk assessment of change control in operation. Thus, the use case illustrates a typical example of risk management throughout the system life cycle (see Sect. 5.3 of GAMP 5 [5]). As a cross-stage engineering TD is identified and confirmed, this example is proof of answer for RQ2.2. Also, there were similar use cases to this example, such as travel paths not being collision-free due to poor design practice.

4.2.4 Example 3—Workpiece manipulation failure

This example relates to the workpiece manipulation process. A disordered workpiece needs to be manipulated (e.g. sorted back) (cp. Figure 6). However, the handling module attempts inaccurately due to compressed air failure or workpiece density (cp. Figure 6b1–2).

Fig. 6
figure 6

Workpiece manipulation failure—a white workpiece need to be sorted back, b1 successful attempt of handling module, b2 failed attempt of handling module

The analysis identified an intentional misjudgment at the second evaluation (Documentation TD) in this use case. The risk trace matrix noted different detection/prevention strategies being applied at different levels, such as defect/failure in compressed air supply and bugs in the software (OK/NOK strategy). Thus, the root cause might not have been identified yet, but the risk priority was reduced from High to Low/Median, and some undesired additional procedures were added into SOP (Standard Operating Procedure) for the customer. For instance, when the pneumatic supply returns, the machine sends a message to check the production system and confirm the HMI. The operators would need to check the location of the parts before continuing production. Therefore, additional effort might be required (e.g. operator training) due to letting production compensate for the problem originating in the engineering phase.

There were two options to handle if this failure occurred. The first option could be an implementation of an automatic handler with a software update (e.g. temporarily changing conveyor direction to move back the grey workpiece and then performing the second attempt). The effort for software implementation was discussed in previous work [37]. The second option, as presented, would be manual fault handling. There, the disorder workpiece is manually manipulated by the operator then needs to update the current state via HMI. The effort could be calculated using HTA (hierarchical task analysis), a technique widely used to analyse human tasks by constructing a hierarchical task list associated with a specific process [50]. Further work would be required to compare the efforts required for these two options.

4.3 Delta between first and second risk evaluations

On average, risk priorities are reduced from High to Low (cp. Figure 7). “Incorrect adjustment” appears several times in risk analysis at different areas (purple columns). The delta between the evaluations (grey columns) seems to be pretty high (only risks having both evaluations considered). It should be noted that the system has three “Assembly” areas (A07, A13 and A16) and the most occurrences of “Incorrect adjustment” occur at those areas. The results indicate to which extent Documentation TD might have occurred at areas and where it appeared the most. Thus, further investigation is recommended in those areas.

Fig. 7
figure 7

High delta between two risk evaluation rounds and Documentation TD at assembly areas

4.4 An evaluation with industrial experts

An evaluation via web meeting was conducted with two industry experts from the company which provided the engineering documents to present and confirm the results. One expert was a specialist in Programmable Logic Controller programming and documentation. Another expert was in a management position. Thus, the findings could be assessed from both technical and management perspectives. Since the presentation contained many details, the presentation handout in PDF format slides were sent to the experts before the meeting so that the experts could compare the preliminary findings with their observations on their projects to provide feedback or corrections. The presentation includes an introduction to the TD concept, delta between first and second risk evaluations (cp. Figure 7), TD classification (cp. Figure 2). Two selected examples (reported in Sect. 4.2.2 and 4.2.3) were presented and confirmed. Convinced by the presented research results, the experts noted that they would need to conduct further discussions with quality personnel to review the significant difference between the two evaluation rounds. Thus, RQ3.1 and RQ3.2 were addressed.

5 Conclusion and outlook

This study uncovers the contagious character of TD. TD may not only spread throughout the life cycle of a manufacturing system but may also burden the operating company and customer consuming food, beverages, or a pharmaceutical product. The present TD items are associated with risks of the manufacturing systems in the healthcare sector, which requires specific safety-critical validation processes such as GAMP. Most of these systems often operate for several decades. Thus, the research results are critical as they could support avoiding harming lives and reducing cost for TD. The reported TD use cases were presented and confirmed by industrial experts. Therefore, the empirical evidence from this work contributes the enlargement of TD classification with TD sub-types from validation of process control systems engineering that needs to undergo a validation process according to GAMP. The research outcome regarding risk evaluation shows areas to improve OEE for the studied manufacturing system from a TD perspective. TD might significantly impact the OEE for automated production systems; thus, machine and plant manufacturers should start monitoring TD more closely.

The TD identification step reported in this paper is a prerequisite for achieving further TD management goals, such as supporting decision-making to indicate whether TD items are acceptable or the decision needs to be changed. In case many TD incidents are reported, but the resources or budget to resolve TD are limited, it would be helpful to know which categories of TD should be repaid first and which could be tolerated for some time. Unfortunately, the work on TD prioritisation is still limited [2]; therefore, future work should pay more attention to the TD prioritisation topic. The presented TD classification could assist a simple TD prioritisation method since the cost to handle the TD found in the later stages of the life cycle might be higher than in the earlier stages. A starting point for further work may be the results of [47] proposing a risk priority indicator.

The developed meta-analysis approaches can be reused as an analysis method for other work aiming at a similar purpose (i.e. TD identification using engineering documents). In addition, the presented meta-analysis techniques can be further applied to engineering documents in different domains. Thereby, the results from this work can be used as a reference as well as a benchmark.

Future work on TD in aPS can examine other documents such as manuals, maintenance logs or purchasing documents. The results could support aPS manufacturers and customers to increase OOE further; thus, improving manufacturing productivity.