Introduction

Identifying defects within manufacturing and assembly processes stands as a critical aspect of ensuring the ultimate quality, functionality, and aesthetics of a product (Azamfirei et al., 2023). This task involves meticulously identifying missing parts, misfitting components, deviations from specified shapes or tolerances, and surface imperfections such as textures, scratches, or adhesion issues. The significance of defect identification lies not only in maintaining overall product quality but also in addressing potential safety concerns, underscoring its pivotal role in the production domain (Sharma et al., 2023).

Visual inspection, traditionally a human-driven endeavor, in the realm of defect identification, has undergone a transformative shift toward automation and CAD-based solutions with the advent of computer-aided technology and digital cameras (Newman, 1995). This evolution, dating back to the 1970s (Harlow, 1982), has been fueled by seminal papers highlighting the potential of automated solutions driving the application of Vision Inspection Systems (VIS) across various industrial sectors (Bonnin-Pascual & Ortiz, 2019; Huang & Pan, 2015; Lupi, Pacini et al., 2023; Mordia & Kumar Verma, 2022; Yasuda et al., 2022). Formally, VIS can be defined as a set of existing off-the-shelf, commercial, or prototypical electromechanical hardware and information technology software components integrated in a novel way to solve inspection problems. The embedded technology includes contactless sensors across a broad electromagnetic wavelength band (e.g., X-ray, ultraviolet, visible, infrared, microwave, and multispectral), handling and lighting systems, as well as data models and computational architectures. VIS relies on the extraction and interpretation of information acquired from digital images via inspection algorithms (e.g., rule-driven and Artificial Intelligence (AI)/Machine Learning (ML)-driven) to make decisions (Lupi et al., 2023).

Despite these advancements, a notable challenge persists in the high cost and rigidity of traditional automated VIS. Hardware components are often tailored for specific applications, featuring inflexible architectures (i.e., lacking programmable mobile components) or requiring manual reconfiguration with substantial setup time. On the software side, significant constraints involve manual low-coding efforts required from proficient personnel to customize specific inspection algorithm parameters (Lupi et al., 2023a; Sun & Sun, 2015). Frequently used for large production volumes, specific product sets, and structured lighting environments, these systems often prove impractical for Small and Medium-sized Enterprises (SMEs) engaged in lower volumes, higher variance productions and more unstructured environments.

To bridge this gap, the concept of a framework for Flexible Vision Inspection System (FVIS) has recently been introduced in the literature, emphasizing modular inspection hardware and software with reconfigurable capabilities through seamless reprogramming via software (Lupi et al., 2023). Building upon subsequent advancements, a modular framework for autonomous FVIS based on self-X capabilities has been presented by the same authors as a fully flexible hardware and software solution capable of handling complex generative inspection instances (i.e., different product families) (Lupi et al., 2024). This system represents an ambitious attempt to explore the highest Level of Automation (LoA) (Peres et al., 2020; Vagia et al., 2016) for VIS, unlocking intelligent or cognitive capabilities for autonomous reconfiguration of software and hardware components via software reprogramming, starting from the digital information in the 3D CAD file of the products.

Despite the highlighted promising directions proposed in (Lupi et al., 2024), the self-X modules and their integration into the overall framework are presented at a proof-of-concept stage, lacking a comprehensive end-to-end self-reconfiguration pipeline implementation and integration. To successfully achieve a higher Technology Readiness Level (TRL) for industrial implementations, there is a need to fill this gap by better integrating the different technologies of the proposed system to deliver the aimed scalability and flexible plug and play features for SMEs. To the best of the authors’ knowledge, this marks the pioneering attempt of its kind.

Toward this aim, the current paper begins with a wide perspective on theoretical aspects (in Section “Overview of foundational theoretical concepts”), including key literature in the field, providing necessary background and context within the scope of the article for the reader’s understanding of the novelties proposed in the new framework presented in Section “Updated framework”. Transitioning from theory to practice, results are presented in Section “Experimental results”, implementing the developed framework for a laboratory case study. Discussion is provided in Section “Discussion and future developments”, concluding with final remarks in Section “Conclusions”.

Overview of foundational theoretical concepts

This section provides an overview of the foundational theoretical concepts addressed in this paper. Section “Enriched CAD for downstream visual inspection” highlights the importance of enriched 3D CAD files in Model-Based Definition (MBD) for Model-Based Enterprises (MBE), with a specific focus on visual inspection. In Section “Evolution of VIS within manufacturing systems”, we delve into the concepts of flexibility, reconfigurability, digital thread, and autonomy in visual inspection within the manufacturing domain, tracing the evolution from initial dedicated systems to the development of more recent solutions aimed at addressing current production challenges. Transitioning to Section “The Reference Framework”, we provide a detailed overview of the reference framework for this work, which serves as the foundation for the proposed enhancements.

Enriched CAD for downstream visual inspection

CAD files play a pivotal role in modern manufacturing industries, where digitizing knowledge across design, process planning, production, and inspection processes is increasingly crucial (Feng et al., 2017). As product lifecycle applications grow in variability and complexity, CAD and related Product Data Management (PDM) systems serve as central pillars, enabling seamless integration throughout the entire product development process (Feeney et al., 2015). However, the traditional method of transferring information between design, manufacturing, and inspection phases can lead to redundancies and errors in documentation, underscoring the importance of robust data representation standards, especially in MBE rooted in digital manufacturing (Hedberg et al., 2016; Qiao et al., 2023). Over time, the integration model for product lifecycle data has evolved from 2D engineering drawings to fully detailed 3D digital models (i.e., drawing less), known as MBD or Digital Product Definition (DPD) (Hedberg et al., 2016).

In pursuit of this objective, the International Organization for Standardization (ISO) has released two focal standards, namely ISO 16792 (ISO 16792:2021 - Technical Product Documentation — Digital Product Definition Data Practices, 2024) and ISO 10303 (ISO 10303-1:2024 - Industrial Automation Systems and Integration — Product Data Representation and Exchange — Part 1: Overview and Fundamental Principles, 2024), providing comprehensive explanations of computer-interpretable representation of product information for the exchange of product data during the product lifecycle, in conjunction with annotation practices for CAD (ISO 10303-242:2022 - Industrial Automation Systems and Integration — Product Data Representation and Exchange — Part 242: Application Protocol: Managed Model-Based 3D Engineering, 2024).

Within the context of MBE, annotated or enriched 3D CAD models (i.e., 3D CAD models augmented with additional information during the design phase) play a critical role in enhancing the exploitation of latent potential within CAD model utilization for intelligent manufacturing enabled by the “digital thread” (Morse et al., 2018; Nzetchou et al., 2022). Enriched 3D CAD models encompass two main categories of information:

  • Geometric details (e.g., pixel/voxel data, facet representation, precise surface, solid data, as well as the 3D model’s hierarchical structure composed of sketches, modeling features, and parameters (CAD model tree)).

  • Semantic details (e.g., material properties, assembly-related details such as positioning and constraints, as well as Product and Manufacturing Information (PMI), which includes 3D Geometric Dimensioning and Tolerancing (GD&T) conditions, along with textual annotations for design, manufacturing, and inspection purposes. PMI falls into two main classes, namely computer-readable (i.e., PMI representation also known as semantic PMI) and human-readable (i.e., PMI presentation or graphical PMI).

The practice of incorporating PMI within the 3D CAD models to create a fully defined digital model as the single source of truth for product design, manufacturing, inspection, and fulfillment, has recently gained recognition from several academics (Agovic et al., 2022; Company et al., 2023; Hallmann et al., 2019; Lipman & Lubell, 2015; Lupi et al., 2024; Minango & Maffei, 2023; Mohammed et al., 2022; Quintana et al., 2012; Thomas et al., 2021). The semantic richness of 3D CAD models has unveiled new opportunities by applying the evergreen concurrent engineering and design for (DfX) approaches, aiming to reduce the time to market (e.g., by directly adding inspection features to the geometrical entities during the design phase in the 3D software modelers). According to the National Institute of Standards and Technology (NIST), MBD can shorten the design-to-manufacturing-to-inspection process by about 75% (Hedberg et al., 2016).

The shift toward PMI for MBD is also evident in the seamless integration of dedicated tools into the latest versions of 3D CAD software modelers from leading vendors worldwide, streamlining the capture and exchange of MBD data (e.g., Autodesk Inventor (Model-Based Definition (MBD) | Autodesk, 2024), Dassault SolidWorks (SOLIDWORKS MBD: Model-Based Definition Capabilities, 2024), Simens NX (Geometric Dimensioning & Tolerancing NX MBD | Siemens Software, 2024), PTC Creo (Model-Based Definition, 2024). However, native formats are often closed and not fully interpretable according to the legacy strategy of software vendors (Qiao et al., 2023). Therefore, neutral CAD formats such as Standard Tessellation Language (STL), Initial Graphics Exchange Specification (IGES), and Standard for the Exchange of Product model data (STEP) have gained attention, with STEP AP242 (ISO 10303-242:2022 - Industrial Automation Systems and Integration — Product Data Representation and Exchange — Part 242: Application Protocol: Managed Model-Based 3D Engineering, 2024), highlighted as the most widespread for semantic enrichment (Feeney et al., 2015; Nzetchou et al., 2022).

In this context, the automatic extraction of computer-readable PMI data from enriched CAD STEP AP242 3D models has been proposed (e.g., for inspection (Lupi et al., 2024)). However, despite the Computer Aided Iteroperability Forum (CAx-IF) (CAx Interoperability Forum, 2024) defined recommended practices for interoperable data exchange using STEP files (Lipman & Lubell, 2015), and some schemas being available (STEP ARM Model, 2024), automatic extraction of PMI information from enriched CAD STEP AP242 3D models based on EXPRESS language (ISO 10303-11:2004 - Industrial Automation Systems and Integration — Product Data Representation and Exchange — Part 11: Description Methods: The EXPRESS Language Reference Manual, 2024) still remains an open research topic (Thomas et al., 2021).

Evolution of VIS within manufacturing systems

In the realm of manufacturing, as illustrated in Fig. 1, flexibility concepts, exemplified by Flexible Manufacturing Systems (FMS), emerged in the early 1980s as a pivotal evolution from prior mass production environments rooted in Dedicated Manufacturing Systems (DMS), which catered to one-of-a-kind production and relied on Make-to-Stock (MTS) business models (Browne et al., 1984; Chiera et al., 2021; Koren, 2006). As defined by Browne et al. (1984), flexibility denotes a system’s capability to reprogram its hardware and control components through software parametrization, enabling the handling of generative production problems encompassing products from entirely diverse families and volumes (Yadav & Jayswal, 2018). These emerging proposed production paradigms embraced Engineering-to-Order (ETO) business models.

Despite the advancements introduced by FMS, challenges such as low ramp-up efficiency, high costs, and low throughput paved the way for Reconfigurable Manufacturing Systems (RMS), first proposed by the University of Michigan with Prof. Koren in 1999 (Koren et al., 1999). Reconfigurability refers to a system’s capability to cost-effectively alter and reorganize its modular components (hardware or software) on a repetitive basis to swiftly adapt the production of different variants of products within similar families (Koren et al., 1999; Mehrabi et al., 2000). RMS marked the beginning of a shift to Configure-to-Order (CTO) as a target business model related to production systems. Early successful attempts to expand the RMS platform towards more autonomous behaviors to implement the concepts of RMS in the shop floor include highly dynamic production paradigms such as Holonic Manufacturing Systems (Van Brussel et al., 1998) or Bio-inspired Manufacturing (Ueda et al., 1997) proposed in the late 90s.

In 2011, the German government spearheaded Industry 4.0 (I4.0), unlocking the full potential of FMS/RMS production paradigms.

Fig. 1
figure 1

Performances dimension (i.e., lead time, variants, turbulent markets) pushing the boundaries of the main concepts of this work and its evolution, transitioning from dedicated manufacturing to flexible, reconfigurable, and digital manufacturing systems (highlighted the representative years) toward the current integration of some inherited properties into more recent production environments based on autonomous manufacturing rooted on servitization and plug-to-order business models (Maffei, 2012). According to the diffusion of innovation theory (Rogers et al., 2014), control, hardware, and business model dimensions are depicted with interrupted arrows to indicate disruptive changes in their diffusion. Dotted light-blue line highlights the current early stage of autonomous systems

It associated enabling technologies to champion the digitalization of manufacturing (Lee et al., 2015), also known as smart manufacturing (Peres et al., 2020; Zhong et al., 2017), concurrently emphasizing more efficient and sustainable production approaches (Mabkhot et al., 2021; Xu et al., 2021). This digital transition included advanced cyber-physical integration (Yuan et al., 2022), the surge in smart self-X capabilities (Barari et al., 2021), and opened the floor for higher LoA toward Evolvable and Autonomous production systems (Maffei et al., 2010). These systems have been grounded in industrial data-driven (Tao et al., 2018) and Artificial Intelligence (AI)-powered decision-making (Lee et al., 2018), usually implementing Multi-Agent Systems (MAS) (Ribeiro, 2015), where different agents represent different components on the shop floor, such as machines, stations, or products, working collaboratively to respond to ongoing requests and existing disturbances. Some of these works include ADACOR2 (Barbosa et al., 2015), IDEAS (Ribeiro et al., 2011), BIOSOARM (Dias-Ferreira et al., 2018), and PRIME (Antzoulatos et al., 2017; Rocha et al., 2014). Another apex marked by the digital transition inducted by I4.0 concerns CAD data and related Computer-Aided Engineering (CAE) applications, which reached their maximum potential by introducing the digital thread concept, empowering MBD manufacturing for comprehensive data traceability in a collaborative production domain (Feeney et al., 2015).

These novel production environments concurrently ensure human-centricity augmenting rather than replacing human roles (Fantini et al., 2020; Xu et al., 2021), while introducing recent business models related to manufacturing-as-a-service and plug-to-order philosophies (Onori et al., 2013). As depicted in Fig. 1, this represents an evolution of CTO approaches, where flexible, reconfigurable, and digital systems inherited properties enable agile and shared manufacturing economies. These systems address challenging demands, including increased variants in products and production volumes, dynamic market changes, and reduced lead times (Bortolini et al., 2018; Buerkle et al., 2023; Kusiak, 2023; Lupi et al., 2023b).

Viewed from a historical standpoint, VIS have undergone a parallel evolution, constituting a subset of manufacturing systems. However, many documented instances of VIS still rely on conventional, dedicated systems (Harlow, 1982; Newman, 1995; Sharma et al., 2023). Similar to the challenges faced by traditional manufacturing systems in adapting to new paradigms (depicted in Fig. 1) (Barata et al., 2007), conventional VIS struggle with transitioning toward the autonomous stages. There are only a few studies in literature that address this transition.

Initial endeavors to bridge the gap for flexible hardware (Gospodnetic et al., 2020; Lee & Chan, 1996), and software(Chung et al., 2011; Joshi et al., 2020) in VIS were proposed. Other studies aimed to provide reconfigurable VIS solutions (Barhak et al., 2005; Pistone et al., 2019), alongside laying the foundational basis for the automated generation and reconfiguration of visual inspection algorithms for reconfigurable VIS (Garcia & Villalobos, 2007), also called adaptable automated VIS (Sun & Sun, 2015). More recently, Lupi et al. defined a novel framework for Reconfigurable FVIS (Lupi et al., 2023). To this end, the authors recognized the inherent limitations associated with achieving actual flexible hardware for generative inspection problems. In this context, software flexibility leverages Computer-Aided data for Automatic Feature Recognition (AFR) techniques to extract inspection information from CAD and automatically reparametrize the system in accordance with the extracted information, facilitating the quick ramp-up of modular hardware architecture, which, for instance, can be actuated and reprogrammed through software commands. This work introduced the concept of the ReCo file serving as the link between the digital and physical worlds to reconfigure the NC components (hardware) of the VIS and the parameters for the inspection algorithms (software) seamlessly and without low-coding human input requirements (Lupi et al., 2023). As emphasized by the authors, the ReCo file can streamline the reconfiguration process of similar products. This follows a variant approach inspired by the variant process planning outlined for Computer-Aided Process Planning (CAPP), assuming that similar parts require similar inspection tasks and, consequently, a similar ReCo file.

As the latest development for next-generation VIS, the concept of autonomy has recently garnered attention (Kaiser et al., 2022). Recent work has initiated CAD-based autonomous VIS, aiming to achieve a long-term vision of a reconfigurable (i.e., modular) and fully flexible system (with reprogramming capabilities for both hardware and software modules). This system would handle generative inspection problems without the need for highly skilled operators, leveraging the potential of human-cyber-physical systems and ReCo’s file disruptive potential (Lupi et al., 2024).

The reference framework

The theoretical framework outlined in our prior research aimed to provide a comprehensive conceptual blueprint, serving as the manifesto for a nascent research domain focused on advancing autonomous and reconfigurable FVIS based on enriched 3D CAD files for digital thread enhanced production environments (Lupi et al., 2024). This proposed framework exhibits a modular architecture incorporating diverse self-X, or smart, capabilities, facilitating adaptable hardware system (i.e., robotic arm with sensing apparatus) and software control (i.e., CAD-based) to execute visual inspection tasks, catering to both specific product variants and multiple products families.

In Table 1, we present the primary modules with their respective objectives, input/output information, current TRL, and identified research challenges. For comprehensive details, refer to the source article (Lupi et al., 2024).

Despite the promising direction provided by the framework provided in Table 1, characterized by its digital nature, modular architecture, flexibility, and autonomous features, it remains in a proof-of-concept stage. As shown in the Table, there is a pressing need for better integration of the various technologies introduced across the modules to deliver a higher TRL system suitable for the industrial environments and a more comprehensive reconfigurable approach. Toward this aim, this paper addresses the following Research Questions (RQs):

RQ1: “How can we enhance the TRL of the existing modular reference framework to develop a more integrated (and hence reconfigurable) end-to-end pipeline using an enriched 3D CAD STEP AP242 file as input to generate a ReCo file as output?”.

RQ2: “How can we formally define a ReCo file for practical implementations?”.

Table 1 The reference framework and its modules with specific objectives, input/output, TRL and research challenges for each

Updated framework

This section presents a novel conceptual blueprint for next-generation flexible, reconfigurable, CAD-based, and autonomous VIS. It specifically delves into the process of generating the ReCo file and its associated data flow. While this section adopts a rigorous and theoretical approach to ensure methodological repeatability, detailed implementation decisions are deferred to Section “Experimental Results”. The main research challenges outlined in Table 1 have been addressed to enhance the generality of the framework and its broader applicability, leveraging the same modules and architectural principles reported in Section “The Reference Framework” but with several improvements.

Figure 2 graphically illustrates the primary innovations of this work, depicted within the green area (Module_2,

Fig. 2
figure 2

Red dotted area denotes the scope of the current paper as an adapted version of (Lupi et al., 2024). The new pipeline capable of transitioning from 3D CAD to ReCo file via integrated modules with APIs is shown in green area. Human and autonomous-based processes are labeled with icons. Module_1 represents the design and setup of the system, which differs from the others (i.e., is outside the autonomous pipeline) because it is one-time process, involving significant human input. Top-side showcases MBD with the 3D CAD model as the single point of truth for Design, Manufacturing, and Inspection processes (grey boxes). Dotted arrows represent the digital thread of information flow. Thick black dotted lines highlight the temporary information. Thick black solid lines highlight the main static files transition top-to-bottom (end-to-end) from 3D CAD STEP AP242 to the ReCo file and left-to-right from the Config. file generated in Module_1 to the Inspection report. Thick white arrows represent the API integrations between the modules within the pipeline

Module_3a, Module_3b, Module_4), representing the proposed CAD to ReCo file pipeline. Module_5 has been eliminated since the ReCo file is iteratively generated during the pipeline. The modification resulting from the new pipeline also impacts Module_1, necessitating the introduction of a new entity, namely, the Config. file. The red dotted area emphasizes the updated framework, integrating the new pipeline and the modified Module_1.

From a high-level viewpoint, the framework entities and workflows can be interpreted as follows. During the design process, illustrated as human-centric activity on the top-right side of Fig. 2, the designer collaborates with the industrialization team to define the digital geometry of the product. Subsequently, visual inspection features, whether textual or dimensional PMI, are manually appended to specific surfaces to be monitored during quality inspection. Each inspection feature includes specific (x, y, z) coordinates and versors, identifying its position and orientation within the 3D {CAD} reference frame.

As depicted in Fig. 2, once the 3D CAD model is exported from the 3D modeling software into STEP AP232 format, the pipeline autonomously generates the ReCo file. While the temporal execution flow of the modules progresses from left to right, the information flow originating from the STEP file to the ReCo file (depicted in solid thick black boxes), is represented top to down using dotted arrows. This ReCo file, once generated, serves to automatically execute the inspection process on the physically manufactured product. Within the framework, Module_1 serves as a human-in-the-loop integration point, generating a Config. file represented in a solid thick black box, which subsequently feeds the pipeline’s modules and the inspection process (horizontal information flow). The Config. file results from the hardware/software integration of Module_1 and subsequent setup and calibration to enable sensing and moving capabilities of the sensors in the 3D space, as well as the accurate adoption of vision inspection parameters according to the specific inspection setup (e.g., the Config. file may differ for various product families but remain consistent for different product types within the same family).

With groundwork laid out, the new framework operates within a fully integrated pipeline that facilitates the modules interoperability via Application Programming Interface (APIs), as depicted by the white arrows.

Module_2 self-extracts inspection information stored in the STEP file using reasoning Artificial Intelligence (AI) (e.g., Rule-based Reasoning (RBR), Case-based Reasoning (CBR), or Large Language Models (LLM)). The output comprises a structured list of inspection features, including PMI, geometric parameters, origin, and versors in the {CAD} frame, temporarily saved in the ReCo file. Part of this information is simultaneously accessed by Module_3a and Module_3b.

Module_3a pulls information concerning PMI and geometric parameters to select the most suitable visual inspection algorithm for each specific feature using AI reasoning techniques. This involves accessing a built-in library of vision inspection algorithms, which provides a range of rule-based and machine learning (ML)-based vision algorithms, along with required parameters for these algorithms. Additionally, Module_3a receives supplementary information from the Config. file, including mm/pixel calibration, threshold values for binarization, and pretrained ML parameters that were defined experimentally during the setup and calibration stage of Module_1. After applying AI reasoning techniques for algorithms selection and parameterization, Module_3a records temporary information in the ReCo file. These include a list of vision inspection algorithm names and their related parameters, which are to be instantiated for every feature.

On the other hand, Module_3b retrieves information concerning the origin and orientation of the visual inspection features for vision sensor positioning. It utilizes the artificial point cloud generated from the STEP file by sampling points on its surface, as well as the actual point cloud physically sampled from the 3D scanner chosen for the sensing hardware system (if any). According to the sensor setup and 3D calibration of Module_1, if a hand-eye solution is selected, the transformation matrix from robot base {B} to robot flange {F} to camera {C}, established during the calibration setup, is passed via the Config. file. All this information is utilized within the module to register the artificial point cloud with the actual point cloud and convert the origin and versors from {CAD} to {B}. Additionally, a camera offset can be included in the Config. file to feed Module 3b. The established distance from the inspection surface defines the mm/pixel calibration. With this information, Module_3b applies offsets to the points and rotations to the versors to define the position and orientation of the camera (e.g., facing the specific feature with a given offset and orthogonally), and temporary writes the resulting values to the ReCo file.

Module_4 is designed to reorder the list of inspection features (randomly ordered depending on the STEP file and extraction logic in Module_2), following a specific path and handling strategy (if any), ultimately generating the ReCo file as an inspection routine (i.e., sequence of chronologically order granular tasks). These strategies, defined in the Config. file, fall into two main classes: “exact” (or manual-based) and “guess” (or automatic/autonomous-based, depending on the LoA). Users can define the exact routine of inspecting annotated features by defining a list of subsequent tasks and passing this routine to Module_4, which then applies the new chronological order of positions/orientations of the camera (from Module_3b) and specific algorithms as well as parameters (from Module_3a) for every given feature. In cases involving handling and rotation tasks, Module_4 calculates new orientations/positions for the features to be inspected after product rotation, modifying the values from Module_3b. If the Config. file specifies the “guess” strategy, Module_4 can propose an optimal routine based on specific objective functions (e.g., shortest path, weighted path, mixed, or custom approaches). The output of Module_4 is the final ReCo file, serving as a static source of information output from the pipeline.

As illustrated in Fig. 2, the actual inspection process performed by motion and vision programs is intentionally decoupled from the preceding modules (i.e., within the red dotted area). However, it remains logically connected as it is initiated by reading the ReCo file output from the CAD to ReCo file pipeline, along with the information stored in the Config. file (e.g., robot kinematics, end effectors, offsets for 2D camera), and accessing to the vision inspection library, respectively.

During motion planning, there may be instances where the inspection routine generated by Module_4 cannot be executed for certain points/orientations (e.g., outside the working volume of the hardware). In such cases, a human corrective loop can be initiated to redefine the Config. file with an “exact” routine, which is built upon the guessed routine (e.g., including rotation of the product to access the inaccessible feature). If the hardware has sufficient working volume and can freely access any feature, the guessed routine will be directly applicable, and the motion planning software will ensure collision avoidance with the product (e.g., utilizing the bounding box information provided in the Config. file). The output of the inspection process is an automatically generated inspection report.

Experimental results

In this section, we present the technical solutions adopted to practically implement the theoretical framework proposed in Section “Updated Framework”. To enhance readability, we employ a straightforward use case to showcase the integration of each module and the overall system performance. This approach ensures a clear understanding of the primary contributions of the article while maintaining accessibility by avoiding unnecessary complexity associated with a generative inspection scenario.

Use case for the updated framework implementation

The selected product variants for the use case implementation consisted of two wooden box furniture components assembled with circular metallic elements on five out of the six surfaces. The distinction between the two variants lies in the diameter of the circular element on the top side surface, while the remaining four surfaces presented identical circular elements. Figure 3 (a) visually represents the two different variants.

For both variants, PMI annotations were applied to the 3D model using Autodesk Inventor Pro 2023 dimensional and leader text annotations (Autodesk Inventor Software 2024, 2024). The STEP AP242 file was exported using the export command of Inventor Pro, which adheres to ISO-10303-21 standards and embeds file STEP Tools, Inc. (STEP Tools, Inc. - Digital Thread, STEP and IFC Solutions, 2024), referencing to CAx-IF Rec.Pracs.---Representation and Presentation of Product Manufacturing Information (PMI)---4.0—2014-10-13’) (CAx Interoperability Forum, 2024).

Figure 3(b) illustrates the PMI annotations on one of the two variants.

Fig. 3
figure 3

(a) Perspective view of the two variants of the product under inspection. Difference between the two variants lied in the top side element. The remaining 4 side surfaces presented identical circular elements. (b) Example of adopted graphical PMI (dimensional and textual annotations) for one variant in wireframe visualization mode, subsequently exported as a STEP AP242 file in Autodesk Inventor Pro 2023

Implemented modules

Figure 4 represents the schema of the implemented approach theoretically presented in Fig. 2, using the same graphical elements and connections, expanded with specific operational decisions. For a step-by-step reading and interpretation of the figure, the reader can follow the logical breakdown of Fig. 4 referring to Fig. 2 text.

From an operational perspective, the proposed solutions leveraged Python programming language and fast API libraries to interoperate and exchange information between the various modules of the pipeline, letting the ReCo file dynamically being generated in the form of a JSON file. Evolving temporary JSON are shown in the middle of the Fig. 4 highlighting within dotted boxes (light gray arrows indicating the specific evolution of information). Final ReCo file contained the inspection routine to be executed in the inspection process. The inspection process was carried out through (i) motion tasks developed using Virtual Machine (VM) embedding Robot Operating System (ROS), and (ii) vision tasks, interoperating via an API as well.

The main reason of the APIs implementation was to enable communication between the various modules inside the pipeline proposed in Fig. 2 and avoid reading-subscription conflicts (i.e., communicate the accomplishments of one task and the possibility to trigger the following one), as well as build granular tasks procedural order during the inspection process. From technical implementation perspective, Module_1, must run in the same machine where the modules of the pipeline were running, to let the other modules read the content of the Config. file. Moreover, the motion program developed in the VM (with ROS installed) for the inspection process, needed to have the capabilities to access to that folder.

Fig. 4
figure 4

The implemented framework presented in Fig. 2 using the same graphical elements enriched with adopted technical solution. Modules for the pipeline (green boxes), Module_1 (white), and inspection process (gray), input and output information (dotted lines), files examples within thick solid boxes, temporary information within thick dotted boxes (light gray arrows and boxes shows the content information evolution), and APIs for integration between the modules (thick white arrows) with the request/put function

Module_1

In line with the hardware/software configuration outlined in the reference framework in Section “The Reference Framework”, this study employed an anthropomorphic 6DOF ABB IRB120 robot (IRB 120 | ABB, 2024) controlled via ROS (ROS: Home, 2024) integrated with an eye-in-hand Intel Realsense D415 RGBD camera (Depth Camera D415 – Intel® RealSenseTM Depth and Tracking Cameras, 2024), and a custom parallel gripper. Figure 5 represents the integrated system and its main components. Table 2 details key information about the components.

Fig. 5
figure 5

(a) Robot in the home position with the integrated physical components highlighted via labels. (b) The digital twin of the system with highlighted the main reference frames (i.e., camera {C}, base {B}, flange {F}) in ROS visualization (Rviz) environment

A notable advancement in this implementation was the incorporation of an all-in-one RGBD camera, capturing both depth and 2D images, alongside the integration of an end-effector for product handling (e.g., rotation) in cases where the feature to be inspected was inaccessible. Specifically, a Config. file data structure was developed to encapsulate essential hardware/software calibration and setup information. This Config. JSON file effectively instantiated the theoretically presented information in Fig. 2 for the specific application, serving as the output of Module_1 (depicted in Fig. 4) and detailed as follows.

Config. Information for motion

Robotics description include the XML Macros (Xacro) file in Unified Robot Description Format (URDF) format. Xacro is a ROS file format used to simplify the XML code of the URDF. The URDF files describe the physical and visual aspects of the described robot, including geometry, kinematics, and dynamics. Semantic Robot Description Format (SRDF) was also used to complement the URDF by specifying the semantic information of the robot, such as the configuration of move groups. Hence, it was possible to describe for ROS the end effector (i.e., the move groups of gripper and camera), facilitating motion tasks in the inspection process to accurately locate the camera. Quaternions and translation of the robot end effector (i.e., camera group) in the home position, extracted from the controller, during calibration setup, enabled Module_3b to calculate the transformation matrix from {C} to {B}. Path strategies, categorized under the “Path” dictionary, essential for Module_4, encompassed “method” values of “exact” or “guess”. For “exact” a sequence denoted “choice” (e.g. [5, 4, “rotate”, 3, 2, 1]) indicated the original IDs of features manually ordered chronologically, including granular tasks like “rotate” when necessary. In the case of “method” = “guess”, no specific sequence was indicated in the Config. file, and Module_4 autonomously determined task order based on a custom approach. The rotation value in radiant (i.e., 1.57) was utilized to rotate features IDs 3, 2, and 1 in Module_4. Surface offset of the camera relative to surfaces to be inspected (i.e., 155 mm) accounting for autofocus and Field of View (FoV) constraints (to capture specific defects) were included in the Config file. IP address of the gripper for opening and closing it in the motion program during the inspection process together with host IP address communication between the motion program and the vision program during the inspection process were included in the Config. file.

Table 2 The main components highlighted in Fig. 5, characteristics, cost, setup and calibration activities and reference datasheet
Config. Information for vision

For ML-based vision inspection parameters the “best_V5.pt” file name and confidence threshold of 0.7 was stored in the Config. file. For rule-based vision, parameters encompassed a threshold of 110 and a calibration factor set at 9.0 pixel/mm, calculated considering the defined orthogonal offset. For all the algorithm types, a parameter in the Conifg. file defined the light state of the end effector. For ML-based algorithms, it was set to on, while for rule-based, it was set to off. The IP address of the lighting system to activate/deactivate lights according to specific conditions was included as well.

Despite the generic nature of the vision inspection algorithm library, and the fact that building this repository constituted a separate task from Module_1, the experimental calibration and setup of environmental parameters values stored in the Config. file were implicitly linked to these algorithms. Hence, we briefly report how we built the library and calibrated the algorithms.

Rule-based shape detection

The algorithm read an image via Intel RealSense Software Development Kit (SDK)2.0 (Intel RealSense SDK 2.0, 2024), converted it to grayscale, and applied a 5 × 5 Gaussian blur. It then converted to HSV color space, extracting H, S, V values. Binarization with a fixed threshold of 110 on the S channel was experimentally tuned based on lighting conditions. Morphological closing (erosion, dilation) was applied. Blob analysis detected shapes (circles), extracting centroids and pixel diameters. The calibration factor pixel/mm was defined.

ML-based texture defect detection

YOLOv5x and YOLOv8x were compared in detecting glue on product surfaces, with YOLOv5x outperforming (precision: 95%, recall: 86%, mAP50: 92%, mAP50-95: 58%). The initial set had 142 tagged images, preprocessed, and resized to 640 × 640. Augmentation increased it to 378 images via flips and rotations using Roboflow (Roboflow, 2024). Training used default hyperparameters, 41 epochs, and a final set of 360 images, with 18 for validation. Testing was performed on a separate set. The dataset is public (Roboflow Data, 2024).

Module_2

Throughout the product design phase, semantic PMI (textual annotations and dimensional tolerance) was manually integrated into the CAD model using the built-in annotation features of the CAD modeling software such as leader annotations in Autodesk Inventor Pro 2023. The enriched 3D CAD model was subsequently exported in the standardized STEP AP242 format as indicated in Sect. 4.1.

The machine-readable nature of the semantic PMI allowed for the development of a Python script dedicated to extracting the annotated 3D leaders, dimensional features, and referenced geometries, storing this data in a JSON file. The extraction process was devised through a meticulous examination of the STEP file’s structure, employing both backward and forward chaining methods in accordance with CAx IF recommended practices documentation. Extracted information for each feature encompassed origin, versor, surface geometry, and PMI. Figure 6 illustrates a schematical representation of the implemented extraction chain for textual PMI annotations.

Module_3a

This module was developed to establish a FastAPI (FastAPI, 2024) server for receiving and processing data pertaining to the PMI annotations in the STEP file as extracted in Module_2.

Its primary objective was to associate specific vision inspection algorithms from the vision library with the respective parameters for each feature extracted from the STEP file. This process relied on IF-THEN rules based on AI reasoning techniques, particularly RBR. For instance, if an annotation such as “absence of glue” was indicated, the ML algorithm was automatically selected (i.e., name “ultralytics/yolov5”), with predefined parameters ‘Defects ML/best_V5.pt’ and confidence level of 0.7. These parameters were obtained from Module_1’s experimental training phase and passed via the Config. file.

Fig. 6
figure 6

Top side: A simplified data schema of the information extracted from the STEP file for a specific feature to be inspected, namely #79 = GEOMETRIC_ITEM_SPECIFIC_USAGE(‘’,’’,#85,#1320, #722) following CAx IF recommended practices documentation (CAx Interoperability Forum, 2024). Bottom side (red): automatically generated JSON file including PMI, param, origin, versor also shown in Fig. 4

Similarly, in the case of an annotation indicating “presence”, a rule-based vision inspection algorithm named “rule_based_circle” was chosen. This algorithm was programmed to detect the binary presence of circles, with additional robustness introduced by passing the radius of the circle (extracted in Module_2) under inspection. This radius indicated in Figs. 4 and 6, was passed to the Module_3a via the API shown in Fig. 4, along with parameters such as threshold value (110) and calibration factor (9.0) read from the Config. file.

When dimensional tolerances were present, the same rule-based vision inspection algorithm “rule_based_circle” was utilized, but with specific circle radius and tolerance values passed from Module_2 via the API shown in Fig. 4. Once again, threshold values and calibration factors were read from the Config. file.

For all the algorithms the light conditions were passed from the Config. file.

As a result, after processing all the inspection features and assigning the algorithm and the parameters to each one, the updated JSON was generated in output of the module as shown in Fig. 4.

Module_3b

This module was developed to establish a FastAPI (FastAPI, 2024) server for receiving and processing data related to the geometrical information (origin and versor for each feature) extracted in Module_2.

Within this module, we implemented the self-registration of digital-physical information to transform the {CAD} frame data into into the {B} frame, defining the position and orientation of the camera for each feature. The output of this module consisted of a list of points (x, y,z) indicating camera position and Euler angles ZYZ indicating how to orient the camera orthogonal to the surface for each inspection feature. The main steps and implementation are outlined as follows.

Step1

Synthetic point cloud extraction from the 3D STEP CAD model was achieved by sampling points on the 3D model’s surface. This initial task was performed manually using Meshalb (MeshLab, 2024) and exporting the PLY file in the working folder. Upon (randomly) position the product in the robot workspace, the real object point cloud was collected using depth.read() function from the Intel RealSense SDK 2.0 (Intel RealSense SDK 2.0, 2024), considering only those points below a given threshold (Z < 30 mm). Automatic registration of the two point clouds was performed to obtain the transformation matrix {CAD} to {C}. This involved calculating the scale factor (actual and artificial point clouds may have different unit of measure, in this case camera unit of measure was in m, CAD was in mm), rotation, and translation using Python script with Probreg library (GitHub - Neka-Nat/Probreg, 2024). Figure 7 illustrates the actual point cloud registration process for a specific object pose, along with an example of the calculated scale factor, rotation matrix, and translation vector.

Step2

Leveraging the constant 3D calibration from the Config. file ({B} to {F} to {C} frame is constant because robot home is constant), the origins and versors of the features to be inspected were transformed to the final {B} reference frame of the robot. For versors, only rotation was applied. The feature’s origin point underwent rotation and translation, defining the camera position adjusted by a Z offset 155 mm.

Step3

Achieving camera orthogonality to the product required aligning the Z direction of the camera with the versor, but with opposite sign. This operation yielded three rotation angles around the main axes, defining the rotation of the end effector relative to the {B} frame of the robot. Specifically, ZYZ Euler angles were calculated as follows.

$$alph{a}_{z}= arctan2({y}_{v}, {x}_{v}),$$
$$bet{a}_{y}= -arccos(-{z}_{v}),$$
$$gamm{a}_{z}= arctan2\left({y}_{u},{x}_{u}\right).$$

Where \(alph{a}_{z}\) was the rotation around Z, \(bet{a}_{y}\) was the rotation around Y, and \(gamm{a}_{z}\) was the rotation around Z. The values of the three angles were converted in radiant.

For\(alph{a}_{z}\) the \({y}_{v},{x}_{v}\) (i.e., components of the directional versor in {B} of the surface under inspection) were used, while for \(bet{a}_{y}\), \({z}_{v}\) (i.e., the components of the same directional versor in {B}) were utilized. After applying ZY rotations (alpha and beta), the camera achieved orthogonality to the surface but retained a degree of freedom,

Fig. 7
figure 7

(a) Point clouds after registration (applying scale factor, rotation, and translation to the source cloud (blue) artificially sampled from the CAD, to align it with the target cloud (green), which was physically captured from the depth sensor of the camera). (b) An example of calculated parameters for registering a specific pose is provided. The scale factor of 0.001 is due to the difference in measurement units between CAD (mm) and the camera (m)

requiring fixing gamma. For \(gamm{a}_{z}\), \({y}_{u},{x}_{u}\) (i.e., a versor in the plane \(xy\) orthogonal to the fixed direction of \(z\)) was used for avoiding having the camera in any possible orientation in the plane \(xy\). In the specific case, these values were calculated using basic trigonometry rules outside the scope of this paper.

The JSON file output from the registration process storing the \((x,y,z)\) coordinates of the camera and ZYZ orientation, is presented in Fig. 4 as output of Module_3b.

Module_4

This module was established to configure a FastAPI (FastAPI, 2024) server receiving and processing data pertaining to feature algorithms/parameters (from Module_3a) and camera coordinates/orientations (from Module_3b).

Upon verifying the receipt of both the information, the module merged the lists of dictionaries by matching the ordered IDs (i.e., Module_3a and Module_3b generated two lists with IDs ordered randomly. However, this order is the same for both).

Module_4 also read the Config. file for additional input, determining the path strategy that dictates how to organize the order of the new list, ultimately defining the ReCo file routine. For experimental purposes, both exact and guess strategy were implemented.

In the case of exact strategy, the list provided in the Config. file (i.e., 1, 4, 3, “rotate”, 5, 2) served as the reference order of granular tasks. The information regarding algorithms/parameters and position/orientation were reordered, with a rotation of 1.57 radians inserted. This information was stored in the Config. file.

For the guess strategy (highlighted in the Config. file of Fig. 4), a custom approach was implemented. The first feature to be inspected was selected based on the minimum y (ID1), followed by the maximum z (ID4), the maximum y (ID3), a rotation of 1.57 (“ID”: “Rotate”), the minimum y after rotation (ID5), and finally the maximum y in the rotated coordinates (ID2).

The ReCo file produced as output was a JSON file containing the ordered list of inspection routine IDs, comprising the original features IDs along with other granular task IDs presented as string such as “rotation”, along with their respective values. An example of the ReCo file is partially reported as output of Module_4 in Fig. 4 and fully reported as additional material.

Inspection process

The inspection process orchestrated the interaction between motion and vision programs through an API (FastAPI, 2024). Initially, the motion program was triggered as the robot was in its home position, requiring the camera to be positioned for inspecting the first feature in the ReCo file order. Once the camera was properly positioned and oriented relative to the inspection surface (with offset accounted from the Config. file), a request was sent from the motion program to the vision program, as illustrated by the arrow in Fig. 4. Upon completion of the vision inspection (calling the specific function from the vision functions library and extracting the relevant parameters to execute the function), the vision program sent a message triggering the motion program to proceed with its subsequent tasks (e.g., other movements involving camera positioning and orientation, or product handling and rotation according to the granular task order listed in the ReCo file).

Throughout this process, access to the ReCo file generated from Module_4 and the Config. file generated from Module_1 was required. To access the files, we implemented a shared folder based on the specific technical decision to utilize ROS running on a VM for the motion program. For the vision program, there were no issues as it was implemented on the same machine as the pipeline modules were running.

Further detailed design decisions regarding motion and vision programs for this specific implementation are provided below.

Motion program

No bounding box information was passed in the Config. file for this application. To prevent camera collisions with the product, intermediate points were included between each movement. The motion program ran on the VM Workstation 17 Player (VMware Workstation Player, 2024) with ROS Melodic Morenia (Melodic - ROS Wiki, 2024) installed on the Ubuntu 16.04 (Ubuntu, 2024) and was able to transfer files from the host machine to the VM via a shared folder created using the built-in functionalities of the VM software, facilitating the access to the Config. and ReCo files. The MovIt library (MoveIt Motion Planning Framework, 2024) was utilized to perform the motion planning between two points. The physical attributes of the manipulator and camera were added in the Xacro file (Xacro - ROS Wiki, 2024), while the end effector (i.e., gripper and camera) were included in the SRDF file (Srdf - ROS Wiki, 2024). The tf.transformations library (Transformations — Tf 0.1.0 Documentation, 2024) was employed for angle conversions during motion tasks. ROS motion server was used to move the robot, and ROS state server received real-time status updates on the robot. The home position was defined by fixed joint positions experimentally. Left, top, and right positions were dynamically defined according to the ZYZ Euler angles in the ReCo file. Http requests were employed to close/open the gripper or switch the light on/off and to send the information that the movement was done and the inspection was able to proceed.

Vision

Reached each point for the inspection, the vision program retrieved the specific vision inspection algorithm from the vision library, located in a folder inside the main directory, and parametrized it. The algorithm associated with the specific feature was read from the ReCo file, along with its parameters, which could have values or be null (i.e., {“Radius”, “Tolerance”, “Threshold”,“Calibration_factor”, “Confidence”, “preTrainedModel”, “Light”}). After executing the inspection algorithm, the results were appended to a list, and images were saved in the folder in the event of a defect. Once the motion program completed its tasks, a request was sent from the motion program to the vision program to generate the “Results” report. This report contained the inspection results for each feature, with images saved for identified defects, accessible in the vision inspection folder.

Some images, showcasing both the rule-based and ML-based approaches, are presented in Fig. 8. Only top side view presents a detected defect.

Tables 3 and 4 depict the experimental batch comprising variant 1 and variant 2. For every variant, 6 inspection features were evaluated for six different samples, resulting in a total of 72 quality control checks.

Table 3 Batch 1 variant 1.1 (30 mm top diameter)
Table 4 Batch 2 variant 1.2 (16 mm top diameter)
Fig. 8
figure 8

Inspection images output from the inspection process. (a) rule-based presence of side view (1), (b) rule-based dimension with measurement in mm of top element diameter, (c) ML-based absence of glue with confidence value, and the detected defect, (d) rule-based presence of side view (2)

Adopting the metrics from literature (Lupi et al., 2023a; Zheng et al., 2021),

$$\eqalign{& TPR\left( {{\mathop{\rm Re}\nolimits} call} \right) = {{{\rm{TP}}} \over {{\rm{TP + FN}}}};\,FPR = {{{\rm{FP}}} \over {{\rm{FP + TN}}}}; \cr & Accurancy = {{{\rm{TP + TN}}} \over {{\rm{TP + FP + TN + FN}}}};\,\ Precision = {{TP} \over {TP + FP}} \cr}$$

where:

TP (true positive): the number of defects that have been correctly classified as defects.

FN (false negative): the number of defects that have been incorrectly not classified as defects.

FP (false positive): the number of non-defects that have been incorrectly classified as defects (false alarm).

TN (true negative): the number of non-defects that have been correctly classified as non-defects.

True Positive Rate (TPR) also called Recall should tend to 100%, while False Positive Rate (FPR) should tend to 0%. Value over 0% for FPR can be chosen for conservative purposes, e.g., during the system development. Accuracy and Precision should tend to 100%. Testing the system, a sample of 72 quality control check have been used, obtaining the following results: TP = 22; FP = 3; TN = 46; FN = 1. Therefore, TPR = 96%, FPR = 6%, Accuracy = 94%, Precision = 88%.

While computational time is not the primary focus of this paper, and productivity optimization is beyond the current scope, readers interested in quantitative data regarding the computational time for the CAD to ReCo file pipeline and inspection process can derive this information from the two videos provided as supplementary material.

It should be noted that the experiments were conducted using a PC equipped with an 11th Gen Intel(R) Core (TM) i7-11370 H @ 3.30 GHz processor, 16.0 GB RAM.

Discussion and future developments

The newly defined framework and its implementation aim to address RQ1 by providing a revised blueprint that enhances the TRL of the existing modular framework for next-generation VIS. The focus lies on integrating the modules within the pipeline to transition from the 3D CAD STEP AP242 file to a ReCo file for the VIS system. For broad applicability in an industrial context and enhancing the TRL, the pipeline is fully integrated into the proposed framework, including configuration and setup activities (with the introduction of the Config. file) and the inspection process (motion and vision programs).

Modularity and integration within the pipeline now allow users to select a subset of modules or the entire suite, with the option to customize the modules according to specific needs. This customization can be achieved internally, if the necessary skills are available, or via outsourcing (e.g., to industrial marketplace hubs). In these hubs, various stakeholders—such as Original Equipment Manufacturers (OEMs), System Integrators, End Users and VIS as a Service (VISaaS) providers—can subscribe, following a servitization approach (Akula et al., 2017). In the context of next-generation VIS, servitization involves transforming traditional product-centric offerings into comprehensive service-oriented solutions.

From an applicability perspective, the framework now offers a more detailed and formal definition of the entities and their integration. It formally defines what a ReCo file is and how it is adopted to reconfigure the system, addressing RQ2. Additionally, the framework now defines the Config. file, which is a major achievement because it allows for the inclusion of specific custom aspects (that cannot be stored in the ReCo file but require flexible adaptation) regarding the configuration setup that need to be considered for the pipeline modules and the inspection process.

Focusing on Module_1, the framework allows for the application of different hardware solutions within various application scenarios, as shown in Fig. 9. Hardware flexibility bottleneck was identified as the primary limit preventing the achievement of the long-term goal for a generative VIS. However, 6DOF hardware integrated with sensors has been identified as a valuable solution for inspecting small product batches in case of generative inspection problems, with particular relevance for remanufacturing (Kaiser et al., 2022) and reverse inspection for disassembly (Münker & Schmitt, 2022). Other approaches can be utilized with NC cartesian axes and reconfigurable cells when scaling up production volumes and shifting to family variants (or large-sized products) (Chang et al., 2019). To extend the working volume of a 6DOF robot it could be used with an Automated Guided Vehicle (AGV) or in cooperation mode with multiple robotic arms (Abbas et al., 2023). Other flexible hardware solutions such as Unmanned Aerial Vehicles (UAV) can be used to inspect fixed layouts (e.g., navy, civil, construction sector) presenting large-sized, complex, and intricate geometries (Rakha & Gorodetsky, 2018). As shown in the right side of Fig. 9, for mass manufacturing, conventional customized, rigid, and monolithic VIS solutions remain the most recommended due to high productivity requirements, eliminating the necessity for flexible vision inspection architectures (Mar et al., 2011).

For future direction, to enhance the validation framework, ongoing research with the same VIS hardware is addressing multi-use cases of individual products with highly variable shapes produced via Fused Deposition Model (FDM) AM within the project production domain (Fig. 9, left side). Additional use cases in industrial settings are requested to fully validate the framework.

Scaling the complexity of the 3D CAD model (e.g. number of components in an assembly/file size, number of semantic PMI) poses several challenges, both in CAD enrichment (information encoding during the design process) and annotation extraction (information decoding during Module_2). Information encoding/decoding represents a dual-faced aspect, and enhancing the former would benefit the latter. One of the key findings of this paper is the definition of the entities needed to enrich the 3D CAD file. These consist of (i) textual/dimensional PMI (i.e., to associate specific inspection algorithms and lights settings), (ii) the point to which the PMI refers (for camera positioning considering the offset), and (iii) the surface to which the PMI refers indicated via the versor centered at the point (for camera orientation).

Fig. 9
figure 9

Production system taxonomy highlighting the application field of the proposed next-generation VIS in comparison with conventional dedicated VIS. Dotted lines separate the application scopes. Focusing on the new generation of VIS, hardware flexibility (Module_1) scales with product variety and application scenarios (large vs. small-sized products)

Due to the absence of software for creating custom entities for STEP file, the solution adopted in this work relies on general PMI annotations to demonstrate the feasibility of the proposed workflow. However, more rigorous and scalable approaches would involve the creation and utilization of custom STEP entities to store and retrieve these three key characteristics more efficiently.

As additional issues, computational time for the remaining modules of CAD to ReCo file pipeline, including algorithms selection and parametrization, and path planning as well as inspection process (motion and vision), may scale significantly when scaling the complexity of the CAD. Performance metrics analysis via of multi-use cases while scaling CAD complexity is ongoing.

From Module_3a perspective, the selection of the specific algorithms and parameters relies on basic IF-THEN rules but enhancing the alignment of the inspection needs (quality checks) and the available inspection algorithms could be achieved by defining defect ontologies data structures. This task could involve more sophisticated AI reasoning, possibly integrating past cases and pre-trained LLMs capable of interpreting more descriptive textual annotations (Naveed et al., 2023; Wang et al., 2023). Inspection libraries could be browsed and obtained by accessing through a vision inspection marketplace hub, where Software Developers, in addition to the previously mentioned stakeholders, could upload their algorithms and datasets. This open new avenue to reduce the need for low-level coding and time-consuming data collection, while also providing more adaptable and plug-and-play algorithmic solutions towards servitization. Among various algorithms, deep learning approaches may excel in generalizing surface defects or texture-related issues, potentially becoming widely adopted due to their automation, adaptability, and robustness for next-generation autonomous VIS (Sun & Sun, 2015; Zheng et al., 2021).

From the experimental visual inspection standpoint implemented in Module_3a, the 2 mm tolerance, which is applicable in the furniture industry, is related to the resolution of the selected camera. Higher resolution, and overall performance, can be achieved depending on the specific hardware selection.

Module_3b could be enhanced in terms of LoA, including the generation of the automatic generation of the artificial point cloud in the module itself. No specific challenges remain for this module, which has proven to be quite stable. Processing time is a major issue, and more efficient algorithms or strategies for point cloud registration could be used (Yang et al., 2021), specifically in those cases where point cloud registration must be performed in real-time.

Module_4 implements the routing of the granular tasks to be performed in the inspection process and the definition of the ReCo file. This module could be integrated with a library of path strategies, and a weighted approach could be used in accordance with a weight assigned to each specific inspection feature as additional characteristic in the STEP file. This information can then be used to rank the inspection features and eventually stop the inspection process if a major defect is found. Handling strategies depend on the specific case. Information in the Config. file was added with respect to the previous framework to generalize the problem.

The inspection process presents the motion and the vision programs that are case-specific. Sets aside specific decision of the inspection process (similarly to Module_1) helps in generalizing the framework.

Results show an overall 6% FPR, demonstrating high accuracy in the inspection process. The FP value show a pattern of repetition (every time the first feature), and after experimental setup and changing the inspection routine (e.g., switching on the light on the end effector to have more light in the darker zone), the accuracy can be enhanced. This part needs to be addressed in an iterative refinement of the Config. file defined in Module_1. A 96% TPR demonstrates the high performance of the system in detecting defects, with the only miss caused by the ML-based YOLO algorithm. This was due to the selected threshold for confidence, set at 0.7. Lower threshold, more training of the dataset, or better lighting conditions could have avoided this error. Similarly to the other metrics, these adjustments could be made in the Module_1 updating the Config. file, consequently updating the ReCo file during a subsequent system run.

As a possible development the inspection report can be associated with specific standards to automatically record inspection data with a defined model data structure. Currently, the inspection report provides all the information of the inspection, including images of the defects, which can be stored for liability reasons as proof of the inspection.

Conclusions

This paper presents an exploration into the advancements of the latest development of next-generation flexible and reconfigurable CAD-based autonomous Vision Inspection System (VIS). Notable contributions include:

  • The recognition of three main sources of information required to enrich the CAD file and using it as a single source of truth (e.g., for design and inspection): (i) textual/dimensional PMI (i.e., to associate specific inspection algorithms and lights settings), (ii) the point to which the PMI refers (for camera positioning considering the offset), and (iii) the surface to which the PMI refers indicated via the versor centered at the point (for camera orientation). Remaining geometrical information or metadata of the product are embedded in the 3D model description by-design.

  • The integration and enhancement of recently proposed VIS framework via a novel streamlined pipeline from enriched 3D CAD models to ReCo files.

  • The definition of a Config. file encapsulating essential hardware/software calibration and setup information adopted for the ReCo file generation and inspection processes execution.

  • The definition of the ReCo file, which comprises a list of inspection routines (ordered granular tasks) determined by camera positions and orientations (based on path, handling, and lighting strategies) along with specific inspection algorithms and parameter associations, utilized for the execution of the inspection process.

  • The practical implementation of the system through laboratory experimental validation, including 72 quality control checks across two variants of products with 6 inspection features of interest, and testing 6 samples each, demonstrating the system’s effectiveness in defect detection within a real-world context.

  • The comprehensive speculation of potential future research directions for each module, including promising research avenues resonating to novel business model possibilities and applications rooted in servitization, as well as opportunities for vision inspection marketplace hubs.

The goal of this work was to bridge the gap outlined in the previous proposed framework, with the ultimate objective of offering a novel streamlined CAD to ReCo end-to-end solution to elevate the TRL level for industrial adoption of the framework itself. To the best of the authors’ knowledge, this marks the pioneering attempt of its kind, opening a new and promising research area.