1 Introduction

Manufacturing assembly lines are subject to continuous fluctuation in production demands such as customization and quantity [15]. While many continuous and repetitive assembly tasks can be automated to improve production efficiency, the introduction of new product variants to the production line consistently poses a major challenge to automation. Additionally, it is necessary to guarantee workers’ well-beings in newly automated industrial environments, which have evolved from mechanization to cognitive, and even emotional interactions.

Humans are part of every engineered system and can increase competitiveness when integrated in manufacturing processes where the customization level and volume of production change in relatively short intervals. This is because humans can adapt to new manufacturing operations without disrupting the production environment. Consequently, overall human-machine performance in industrial systems is a fundamental engineering concern for processes that are characterized by a significant amount of manual human work. In such systems, human factors (including physical, mental, psychosocial, and perceptual factors) can determine the worker’s performance to some extent [23]. When human-machine system designs neglect specific features and qualities of human workers and presume their performance to be constant throughout the time, they can have a negative impact on worker’s well-being or overestimate their performance [21]. Human factors such as repetitiveness of tasks, handling of heavy loads, fatigue, and static, awkward postures expose workers to ergonomic risks that adversely affect their performance.

Full robotic automation can be found in manufacturing, such as the automotive industry, where robots are designed for press tending, car body assembly, painting, and to a large extent, assembly car engines [10, 16]. The value of automation is not just about reducing labor costs. It is about integrating machines that are more reliable than human workers in tasks that require speed, heavy weight lifting, work for larger periods without interruption, repetition, and other other human factors that inhibit human performance and safety [10]. At the frontier of manufacturing automation technology, Flexible Assembly Systems (FAS) assemble different models of specific product families with negligible configuration times [53]. However, it is rare for robots to be used for final assemblies. One major challenge in undertaking industrial robotics is the design of more economically feasible solutions to handle complex assembly tasks, product geometry variability, and intuitive and interactive means of dealing with process and geometric tolerances [10].

Machines can impair workers’ performances and well-being when automation is poorly designed, and the fact that human workers are more prone to errors than machines also poses a great challenge in the design of new assembly lines and automation processes. The unpredictable and erratic behavior of human workers can limit the optimal operation of robots. Facilitating communication between workers and machines can mitigate the risk to the capabilities of both parties. A limitation of assembly automation is that the flexibility of the system is restrained by the automation design for existing product families. Consequently, manual work cannot be efficiently combined with automated systems which are constantly replaced, often forcing parts of the assembly line to be exclusively manual or automated. All these limitations emphasize the gap in the research literature concerning the integration of human factors engineering into industrial system design, as well as the need to investigate how industrial technologies affect human operators [23, 24, 49, 50, 58].

Removing the human factor can decrease the complexity of industrial solutions. However, maintaining the presence of human workers in the assembly chain can create a competitive advantage because humans can rely on their natural senses to form complex and intuitive, yet instant, solutions whereas robots require reprogramming to address new product families and manufacturing problems. Manufacturing industries have found that augmented reality (AR), virtual reality (VR), and mixed reality (MR) designs and technologies have the potential to empower workers, actively and passively, to perceive and learn information about new assembly processes in a timely manner, without altering their established work routines. Hence, the potential to reduce the cognitive load required for a worker to learn assembly procedures for new product families.

VR immerses workers in a fully artificial digital environment while AR overlays virtual objects on the real-world environment [39]. MR is a system in which a worker is immersed in a digitized environment or interpolates between digitized and physical ones, the nature of the experience widely varying depending on the context [36, 37]. Furthermore, Cross Reality (XR) is defined as specific type of MR system wherein the worker interacts with the system in order to change the physical environment [4].

The heart of current research involves the detailed dissection of these technologies, their relationship to each other, and their unique abilities to augment worker’s cognition and to facilitate skills transfer in industrial assembly environments. We have also investigated how industrial sensors and robots can collaborate and support the transfer of skills to human workers in human-machine hybrid environments. Our objective is to empower human workers with means to augment their production environments with information that adapts to new product families and facilitates the acquisition of new assembly skills. In our case study, we investigate the impact of our system in assembly tasks performed by workers with disabilities, where the customization level and volume of the production change in relatively short intervals.

The remainder of this manuscript is structured as follows. The next section reviews previous studies of augmented reality applied to the field of Industrial Design and Manufacturing. Section 3 describes the research fundamentals needed to define the solution described in Section 4. The technological setup is described in Section 5, and the remaining sections summarize the results and discuss possible future work.

2 Literature review

The use of AR, VR, and MR to enhance natural environments and human cognition has been an active research topic for decades. Along the way, industrial augmented reality (IAR) emerged as a research line focused on how these technologies can support human workers in manufacturing processes. The use of IAR can be traced back to the seminal work of Thomas Caudell and David Mizell at Boeing [12] in 1992 and to the contributions of Navab [41] and Fite-Georgel [18].

With AR, VR, and MR technologies becoming increasingly robust and affordable, new use cases and applications are being explored. Kollatsch et al. [34] developed a prototype for the visualization of information from control systems (e.g., PLC, CNC) directly on-site. Simões et al. [60] proposed a middleware to create tangible in-site visualizations and interactions with industrial assets. Gauglitz et al. [20] proposed a software for a tablet that augments airplane cockpits with AR instructions. Henderson et al. [26] investigated the use of HMDs in maintenance tasks for military vehicles through projected instructions and guidance. Other authors took augmentation a step further and introduced systems for industrial teletraining and maintenance featuring augmented reality and telecollaboration [40, 66]. In addition to augmenting the physical environment, these solutions enable experts to remotely collaborate and exchange knowledge at reduced costs.

Empirical data on the use of IAR in real-world environments is increasingly available and presents well grounded insights into the efficiency of such systems. However, there are a number of challenging factors of IAR development that need further research. Some challenges are more transversal and related to the necessary interdisciplinarity knowledge in areas such as computer graphics, artificial intelligence, object recognition, and human-computer-interaction [28]. For example, user intuitive interfaces still remain a challenge, particularly in situations where the understanding the user’s actions and intentions is required for their adaptation in unexpected conditions. Along this line, Feiner et al. [17] presented a prototype IAR system that implemented what they defined as Knowledge-based Augmented Reality. The prototype presented relies on a rule-based intelligent backend to create graphics that respond to the communicative intent of the AR system. Feiner and his colleagues represented the communicative intent as series of objectives that define what the generated graphical output is expected to accomplish. The authors demonstrated the prototype in a test case that helps workers with laser printer maintenance. While the proposed work is clever in how it creates the user interface, it is neither adaptive or intelligent from a supervision standpoint.

Other research challenges are specific to assembly tasks. Assembling is the process of putting together several separate components in order to create a functional one [28]. Hence, an assembly task requires locating, manipulating, joining parts, and ensuring the expected quality. One critical task is to guide the worker towards certain pieces and parts. Schwerdtfeger and Klinker [57] compared visualization techniques to give positional and directional guidance to the human assembler. To prevent the drawbacks of AR smart glasses, mainly their limited field-of-view, projection-based approaches have been broadly presented as an alternative. For example, Sand et al. [54] developed a prototype to project instructions into the physical workspace, which enabled the worker not only to find the pieces but also to assemble products without prior knowledge. Rodriguez et al. [52] proposed a similar solution in which instructions were directly overlaid with the real world using projection mapping. Petersen et al. [47] projected video overlays into the environment at the correct position and time using a piecewise homographic transform. By displaying a color overlay of the user’s hands, feedback can be given without occluding task-relevant objects. Other authors described mechanisms to detect and prevent human and machine errors, using computer vision, machine learning, and remote assistance methods [40, 43, 59, 64]. In a parallel line of research, Bortolini et al. [7] investigated the impact of digital technologies in assembly system design and management. Facio et al. [16] described how digital part-feeding policies can improve flexible systems in macro and micro-logistic aspects, and Bortolini et al. [6] applied multi-objective optimization models for work balancing to minimize assembly line takt time and ergonomic risks.

Another major problem facing IAR is the development overhead of AR applications, which requires the creation of content and the design of the worker experience. There are popular software libraries like ARToolKit [31] and Vuforia [32] that can detect objects and depict 3D models in real time. However, their use requires programming skills to develop AR applications. An alternative is the use of AR authoring tools, which were proposed a decade ago [25, 33, 51, 60, 62]. Their main advantage is that they do not built upon cost and time consuming recompilation steps, and consequently the updates to the application are fast and can be completed efficiently. Other authors [14, 40, 60, 63] automated certain aspects of content creation, so assembly instructions can be automatically generated from CAD files, thus reducing the authoring burden. However, fine-tuning existent tools to solve a specific domain problem remains a open challenge.

Competitive advantages of AR and VR in industrial assembly and maintenance have been demonstrated in several studies. Tang et al. [62] have shown that AR-based instructions reduced errors and cognitive load of participants by 82% when compared with paper-based instructions, instructions on a monitor, or even instructions steadily displayed on a HMD. However, they also concluded that occlusions by AR content and presentation of the information over a cluttered background would decrease overall task performance. Boud et al.’s [8] experiments demonstrated that task completion times were longer when using the 2D drawings to train how to assemble water pumps before assembling the real product, in comparison with AR and VR training. Funk et al. [19] deployed a MR system and observed a decrease in performance for expert workers and an increase in skill acquisition for untrained workers. Curtis [13] indicated shortcomings in practicality and acceptability of displaying instructions on HMDs.

Other advantages presented by VR, AR, MR, and XR to assembly scenarios include reduction of data retrieval times [65], improvement of ergonomic behaviors in assembly [22], more comprehensive skill transfers, and reduction in training times [27, 45, 46, 56].

3 Methodology

In this section, we formulate the research problem and a XR solution to augment human cognition in assembly environments. XR technology can mitigate workers’ cognitive limitations and facilitate the acquisition of skills, empowering them with new means to undertake more complex tasks. Throughout this paper, we apply our methodology to the case study of assembling medium-voltage switches. The novel contribution of this work is a flexible and inclusive XR solution to empower assembly workers with new tools to learn and supervise the assembly of products prone to variations in volume and diversity.

3.1 Classic workflow

In the classic workflow, worker disabilities affect their ability to perform their job at three levels: interpretation of the wiring instructions to be executed, performance of conductivity checks on components assembled, and interaction with robots. The classic workflow for the production of medium-voltage switches consists of two steps: Assembly system design by a production manager (see Fig. 1) and interpretation of schematics by shop-floor workers (see Fig. 2).

Fig. 1
figure 1

Classic workflow for assembly system design

Fig. 2
figure 2

Classic workflow for the assembly of medium-voltage switches in shop-floors

The system design step is the adaptation of contractors’ data into assembly manuals that are easily comprehended by workers. In this stage, the producer managers identifies the assembly sequence for the components. Then, the person matches the task requirements with the skills in a poll of workers that are available for the job. Then the producer manager designs a new version of the assembly manual that is suitable for all workers chosen for the job. This is an area where both training times and assembly errors can be reduced in the instructional content creation process.

The generation of these assembly manuals for short-lived production series is time consuming and does not scale for a large variety of human disabilities and hardware. This is a limitation that weighs heavily on companies like Lantegi Batuak that employ more than 2000 workers with disabilities at their facilities. Their latest report [35] asserts that 63% of their workers have mental disabilities, 21% have physical disabilities, and 4% have some kind of mental disorder. The remaining 12% do not have any disability diagnosed. Lantegi Batuak is a company with assembly plants for medium-voltage switches and, at the time of writing, their process for adapting schematics to shop-floor works is the one described in the classic workflow.

The medium-voltage switch assembly line is serialized in such a way that several workers take part in completing the assembly of each unit, each of them by performing the same intermediate step of assembly (cable wiring) repeatedly on successive units that advance along the assembly line. The entire line can be duplicated to respond to peaks in production demand by training additional workers.

This assembly process can be generalized to other assembly tasks, which may differ in their levels of human-robot collaboration. In our case study, we want to give emphasis to tasks where robots cannot undertake the entire assembly task, either due to technological limitations or costs.

Robots provide human workers with the components that are required in a well-determined order: The sequence described in the assembly manual. For each part, the worker reads the next instruction, searches for the relevant cables and electric components, and does the wiring (see Fig. 2).

Assembled units are then inspected by a skilled worker responsible for the quality control. It is in the responsibility of this worker to perform a conductivity check for each individual connection. The quality control process is expensive in personnel and production time. Further contributing to this cost, when the worker detects an assembly mistake, it is necessary to disassemble and reassemble the unit. The introduction of mechanisms for visual inspection during the assembly has the potential to decrease average production overheads and times. Unfortunately, production series are limited to a few units and are highly customized to contractors, which makes this difficult and inefficient to automate.

3.2 Generalization and modelization

Assembly tasks present a unique set of challenges for XR, ranging from interaction issues between workers and robots to human factors and production requirements.

Digital augmentation of assembly shop-floors is a two-phase process. In the first phase, which we call the design phase, we define what the task is and how assembly instructions can augment the worker cognitively. In this phase, it is necessary not to inhibit freedom or creativity, or else the design of the hybrid human-machine space will fail to yield any competitive value over traditional automation. The system should adapt the visual context and schedule a new set of actions whenever a worker undertakes an action (e.g., installing a component) that deviates from these instructions without affecting the assembly sequence. In the second phase, which we call the execution phase, we infer functional requirements from the worker’s skills, human-automation interaction preferences, and task context. A worker-driven XR application has to adapt the assembly instructions to different task contexts and worker skills, requiring a significant level of system modularity. Partial worker blindness can, for example, focus the interaction towards aural senses by activating sonorific technologies and adapting content presentation. Another example can be seen in tasks where hands are used either to hold or to perform assembly of specific components when hands are not available for interaction with visual interfaces. The XR system can react to interaction modalities like eye-gaze, voice control, or interaction with projected content.

3.2.1 Modelization of design phase

Let ω(t) be the average time that is needed to optimize the wiring instructions for a shop-floor task t, \(\phi (t, \mathbb {S}) \rightarrow (k, \mathbb {W}) \) the computation that takes k seconds to find a task force \(\mathbb {W} \subseteq \mathbb {S}\) for the task requirements, and \(len(x): \mathbb {W} \rightarrow \mathbb {N}\) the number of disability classes for the subset \(\mathbb {W}\). Let g(t,w) be the time required for adapting the set of instructions t to a specific worker profile w. Then, Eq. 1 determines the time effort required to design an assembly manual for a set of worker profiles \(\mathbb {W}\).

$$ \begin{array}{@{}rcl@{}} ds(t, \mathbb{S}) &=& \omega(t) + k + \sum\limits_{w=1}^{len(\mathbb{W})} g(t, w),\\ (k, \mathbb{W}) &=& \phi(t,\mathbb{S}), \mathbb{W} \subseteq \mathbb{S}, k \in \mathbb{R} \end{array} $$

In the above formulation, the optimization processes w(t) are considered to be independent from task forces \(\mathbb {W}\). Our research focus aims to enhance human cognition rather then improving these functions, which require dealing with a number of other factors that are outside our research scope, e.g., layout optimization. Hence, it is not our objective to find a global minimum for ω(t) and \(\mathbb {W}\).

3.2.2 Modelization of execution phase

An assembly task t requires by definition the manipulation and joining of parts (cables, in our case study). The task of wiring a cable requires the consideration of a number of factors: understanding of the assembly instruction, grasping the cable, determining the relative positions of the cable and its target component in physical space, transporting the cable towards the component, and inserting the cable accurately. Afterwards, the process is typically followed by quality control procedures.

The interpretation time of an instruction i for a worker w is defined as follows: Let α(w,i) be a function that describes the reading complexity (e.g., average complexity of the symbols in the description of the instruction), β(i) the average vocabulary length of the material that describes the task, γ(w) the interpretation ability of the worker weighted in terms of reading, hearing and tactile skills, and λ(w,wt) the function that quantifies the worker fatigue and stress at a given moment wt. Then, the overall interpretation time of an instruction i can be formulated as \({\Phi }: \mathbb {R}^{4}\rightarrow \mathbb {R}\):

$$ h(w,i) = {\Phi}(\alpha(w, i), \beta(i), \gamma(w), \lambda(w,w_{t})) $$

The above formulation considers, for example, that for people with dyslexia the understanding of short statements are easier to cope than longer sentences. Moreover, it considers human factors like fatigue and stress that affect the ability to accurately interpret the assembly instructions.

Once the purpose of the assembly instruction is clearly understood, workers have to assemble the respective instruction cable. Let f(w,c,i) be the time needed for a worker w to find a cable ic described in an instruction i, and Λ(w,ic,i) be the average time required for the worker to wire it. Let \(q(t): \mathbb {P} \rightarrow \mathbb {R}^{2}\) to be the average time required to visually inspect the unit, perform conductivity tests, and to detect a possible faulty instruction e. Assume \(d(e) = {{\sum }_{i}^{k}} d_{t}(k)\) to be the average time to disassemble and reassemble the instruction e. Then Eq. 3 describes how different instructions and worker profiles can affect overall productivity.

$$ \begin{array}{@{}rcl@{}} \rho(t,w) &=& \sum\limits_{i=1}^{len(t)} h(w, i) + s(i_{c}) + f(w, i_{c}, i) \\ &&+{\Lambda} (w,c,i) + e_{t} + \sum\limits_{j=e}^{len(t)} d(j), (e_{t},e)\in q(t) \end{array} $$

Where s(ic) ≥ 0 is the restock time for the component ic.

3.2.3 New modelization

Each operation modeled in the equations above requires a different level of haptic and visual guidance to efficiently enhance the skills of the worker. If multiple cables are to be inserted into a single switch hole, then further cognitive considerations, such as planning, are required. If assembly tasks can have such cognitive activities, then we must demonstrate and observe the effect of unique human-machine interactions and instruction formats in the productivity and learning of assembly tasks.

The technological enabler of the flexibility and customizability enabled by our work is a condition-based rule system. The decision module enables the system to adapt to unforeseen variables during the design phase, e.g., to contemplate interaction behaviors based on different worker profiles and hardware components, without a substantial increase in the complexity of the design process. Therefore, our research work aims to minimize the cost of Eqs. 4 in 1 with the definition of a progressive skill model.

$$ \sum\limits_{w=1}^{len(\mathbb{W})} g(t, w), (k, \mathbb{W}) = \phi(t,\mathbb{S}), \mathbb{W} \subseteq \mathbb{S}, k \in \mathbb{R} $$

The definition of a progressive skill model enables assembly instructions to be automatically adapted to different workers’ profiles not only in terms of information visualization modality but also in terms of interaction modalities, e.g., inclusion of gamification, superimpose less information whenever workers know a priori how to assemble sub-parts based on experience analysis, hence an optimization of Φ.

The modularity of the proposed system facilitates its integration with external tracking and simulation algorithms, which can provide additional inputs regarding incorrect assembly actions, system failures, and encode events in immersive actions surrounding the worker. A decision engine is utilized to analyze machines and monitor the worker’s activity, translating facts and events into courses of action which can be carried out by the system.

4 Proposed methodology and system

Although collaborative human-robot production cells are an intriguing prospect for companies, the complexity of programming environments that integrate complex variables like human workers and a large number of sensors remains one of the major hurdles preventing flexible automation using hybrid industrial robotics.

The proposed system provides two distinct user workflows: One for content creators and administrators, and one for workers. The content authoring tool empowers users with mechanisms to deliver differentiated content for various devices and worker profiles, as per Section 4.1. The workflow is also optimized to reduce the time required to collect media and present live data to shop-floor workers. The assembly workflow tool enhances the interaction between human workers and machines with information by providing machines with information about workers’ activities and needs, while concurrently augmenting the workers’ abilities to work effectively and efficiently with the machinery, as per Section 4.2. In Section 4.3, we describe how these two workflows integrate with each other.

4.1 Experience authoring

The proposed solution describes a set of mechanisms to automate data ingestion (e.g., electronic schemes, wiring instructions, 3D models of electronic parts). Data ingestion can occur in two distinct phases of the XR application: During application design (commonly using for static datasets) and on-demand during the execution of the application (commonly using data collected directly from sensors).

Figure 3 provides an overview of the data ingestion process. In the proposed implementation, users can chose to ingest data through networked services (e.g., sensors, machine-generated data), capture it with a Microsoft HoloLens, or produce it with mobile applications. Wiring instructions in electronic specification schemes and/or Excel format can be dragged to be automatically translated into XR instructions with a simple web interface. Although workers are not required to follow any specific assembly order, our field tests demonstrated that humans prefer wiring the longest cables first. This preference was considered for the presentation of instructions.

Fig. 3
figure 3

Design workflow for the assembly of medium-voltage switches in shop-floors

System-generated XR instructions do not contain enough information to accurately locate assembly pieces in the physical environment. Therefore, it is necessary to manually tell the system how parts look like so that can be tracked. It is also necessary to visualize non-spatial content (e.g., task descriptions) in appropriate locations, using devices such as projectors and Microsoft’s HoloLens. To automate part of this process, designers can define working areas for parts with predetermined starting locations. These areas define where components can be found or placed and where interfaces like social feeds, gamified elements, and other kinds of system messages can be manipulated. During an import phase, instructions are automatically mapped (source or destination) to these areas. Afterwards, it is necessary to further map and animate elements to enhance the XR experience. This task can be executed directly within HoloLens by moving imported elements around or within a web browser with the projector perceptive. We have not yet implemented any additional automation module. However, the system is designed to be extended, supporting modules using a variety of tracking and coordinate systems. Since those modules might have their own data formats and coordinate systems, we have also defined data fusion mechanisms to handle the data transformation between modules.

4.2 Worker-personalized interaction

Our system relies on a Rule Management System (RMS) to enforce condition-based decisions that use technology installed on the shop floor to provide workers with an augmented assembly experience that seamlessly spans over disparate interaction devices, namely mobile tablets and Microsoft’s HoloLens. The RMS creates a relation-based representation of live in situ data, and then codifies it into a database that is used for reasoning. Like other RMSs, our system depends on two types of memory: A working memory to hold data facts that describe the domain knowledge, and a production memory to monitor data events represented as conditional statements. The reasoning engine is responsible for monitoring changes in the network and matching present facts with the rules and constraints defined for each problem.

The reasoning engine organizes and controls workflow based on forward and backward reasoning methods. In the first method, the RMS begins from a set of initial facts, which are then used to determine new facts (or sub-flows) until it reaches its final goal. In the later method, the engine starts from its final goal, then it searches for rules that lead to it.

Drools [11] is an example of a Business Rules Management System and also the solution adopted in our system (see Fig. 4). It delivers a solution for creating, managing, deploying, and executing rule-base workflows. Drools is a very accessible Application Programming Interface (API), allowing rules to be easily modified by humans and digital processes alike. One advantage of Drools is its ability to apply hybrid chaining reasoning, which is a mix between the forward and backward chaining of traditional RMSs.

Fig. 4
figure 4

Business rules management system

The RMS components enable us to deploy a system that is self-configuring and modular. It allows for a seamless integration of intelligent modules while maintaining the value of human elements and, above all, keeping the design process simple. In our test case, workflow rules were defined by humans at the modeling phase. However, through the application of artificial intelligence algorithms, these rules could be updated and/or created dynamically. Furthermore, within a workflow, specific modules might implement their own processes to create rules that help to accomplish subtasks such as path finding and collision detection.

In our implementation (Fig. 5), content creators can define conditions based on what type of devices are connected and/or data events (e.g., sensor value). Conditions might be limited to a set of target devices that handle specific events expected by the RMS. Actions are defined using the authoring API in the same fashion as the authoring of a XR instruction.

Fig. 5
figure 5

Proposed workflow for assembly tasks in shop-floors

In Section 4.3, we explain how our system integrates different teaching and assistive modalities which are not limited to the visualization of primitive types of multimedia content. The restriction to primitive types of multimedia content inhibits learning because it constrains workers to interactions with sequential mediums of information, whereas the information being taught may not be best represented by such representations. The different teaching and assistive modalities have been encoded in three groups of messages modeled in JavaScript Object Notation (JSON) format [9]: graphical elements, data, and event triggers (or “constraint-based events”).

4.3 Industrial Internet of Things (IIoT) and experience authoring

In order to allow the seamless interaction between machines in the physical environment, we needed to implement a communication bridge for each sensor, PLC, device, and robot on the shop floor. Such a bridge brings connectivity to devices and provides a way to translate device-specific events between machines and the overall system. Communication bridges do not necessarily implement machine control functions. They can be used to implement digital twins for physical objects—even before they are built. A digital twin is a computerized (digital) representation of a physical asset, on top of which data can be visualized and analyzed [42]. This digital representation might include a live and comprehensive description of any given physical object. Such a representation would be useful in simulation models [61].

The implementation of digital twins for different machines enables designers to run simulations that demonstrate integrated systems prior to their deployment and to predict the time required to their installation. When the twinned asset is in physical operation, the digital twin can be used to predict component failure. Hence, once implemented, the concept of digital twin can be used during the entire life cycle of the device.

The digital twin paradigm does not inherently require a visual approach. A digital twin can exist as long as sensors on the physical object capture data about its condition and feed it to other systems via some form of IIoT connection.

In our implementation, data generated by machines is associated with unique identifiers in the the digital twin, e.g., joint of a robot. This approach enables the physical system and its submodules to feed the virtual representation of the physical space with real-time streams of data. In addition to providing information on the state of the device, the digital twin also works as an interaction API, e.g., request the robot to pick an object in a given position by inputting new values for each sensor composing the robot in the digital twin. However, for this interaction methodology to work, the communication bridge must translate sensor values received from the XR system into device-specific commands that would result in sensor updates.

Communication bridges are also fundamental to seamless communication between interactive devices. In the design phase of an application, they enable a number of different devices to be used for the authoring of assembly instructions. One authoring approach now possible using these bridges is the modeling of the XR interaction using both a web editor (import feature) and a MR HMD such as Microsoft’s HoloLens. The designer can use HoloLens to model 3D aspects of the assembly instructions that otherwise would require the use of a CAD system, e.g., to perform spatial operations, which include, for example, place and pick objects in the 3D space. The content creation process can also be supported by mobile tablets, which can help easily integrate 2D multimedia content into a 3D scene. When Microsoft’s HoloLens is paired with traditional technology that is more efficient in content creation, the time to design a process can be significantly reduced and the calibration between digital content and the physical space can be spatially maintained. Both tablets and Microsoft HoloLenses can provide similar authoring functionalities. However, the authoring is more efficient when the usage of both is combined. In our tests, we used tablets to import the schematics and insert additional comments and graphical elements. Traditional 2D point and click metaphors proved to be especially efficient to implement.

5 Evaluation

Our evaluation determines the extent to which our content creation approach demonstrates any promising benefits for short-lived tasks. Since immersive approaches for human-robot interaction are new enough not to have an established baseline, our evaluation focuses on the following goals: (1) to assess the usability of our system, (2) to collect qualitative feedback on the design of the system, and (3) to record how people interact with the remote system via the provided user interface (UI).

Two user studies were performed to meet these goals relating to a digital assembly system design. Two user populations were considered: the production manager (see Fig. 1) and the shop floor worker. The experiments involved a total of 10 participants, one production manager and ten shop floor workers.

5.1 User study 1—design of information sharing

We propose an authoring system that can quickly adapt to different assembly scenarios. However, the flexibility of the system is constrained by the skill of the worker using the system. The questions we seek to answer with this study are: (RQ 1) Can new and infrequent users use and effectively exploit the features of the authoring tool to create spatially augmented assembly stations? (RQ 2) To what extent are users capable to understand and model the interaction for the worker skills?

The hardware consisted of a projector, 3D camera, and tablet (Surface Pro). The authoring interface is HTML-based and directly accessible from the tablet. Each participant (production manager) was assigned a task that entailed the design of a XR tutorial to guide workers in their assembly workstations. The authoring steps consisted of (1) ingesting schematics, (2) collecting any multimedia material necessary to support the operator during the task, (e.g., images and videos), and (3) tuning the system and peripherals to effectively adapt to a set of worker disabilities.

In the evaluation of this case study, we have applied the cognitive walkthrough method validated by [48]. This method is especially appropriate for evaluating system learnability, especially for new and infrequent users.

5.2 User study 2—system usability

Usability is a key factor leading to the success of any interactive system. Thus, it is useful to have an evaluation method that entails reliable measures of usability. Based on the definition of ISO 9241-11 (Ergonomics of human-system interaction — Part 11: Usability: Definitions and concepts), system usability can be measured from three perspectives: Effectiveness (whether the system lets users complete their tasks), efficiency (the extent to which users expend resources in achieving their goals), and satisfaction (the level of comfort users experience in achieving those goals). Tools like the Computer System Usability Questionnaire (CSUQ), Questionnaire for User Interface Satisfaction (QUIS), System Usability Scale (SUS), Post-Study Usability Questionnaire (PSSUQ), and Software Usability Measurement Inventory (SUMI) are commonly used to conform measurements to this definition of usability.

In this user study, the usability (effectiveness, efficiency, and satisfaction) of the system was measured with the SUS questionnaire [2]. SUS consists of ten questions with responses on a 5-point scale from “Strongly disagree” to “Strongly agree.” It is significantly shorter than SUMI and recent psychometric analyses have demonstrated that it also provides reliably measures of perceived “learnability.” Furthermore, SUS can be applied with a wide range of technologies, including those that have not yet been invented [3, 55], e.g., the novelty of immersive system that is proposed.

The experimental setup consisted of a dual-arm collaborative robot, a projector, a HoloLens, a 3D camera, and an analog button to enable the human-system interaction. The prototype was designed to draw worker attention to relevant information about the task (e.g., highlight components, depict symbols to identify error-prone steps, to associate aural notifications to different states of the task), and to provide step-by-step guidance with media (e.g., 3D animations, sounds, colors, videos, and interactive checklists).

The information provided to workers consisted of the following elements: A checklist and warning messages displayed on the HoloLens or projected on designed spaces when the device was offline; visual highlighting of the items to be assembled (augmented with aural messages if workers have vision disabilities); and textual instructions personalized to different visual disabilities in size, color, and content (two instructions versions were defined, one is more detailed than the other).

The robot was responsible for stock-feeding the components to workers as swiftly as possible and for ensuring the correct cable was connected. Workers were provided a visual highlight of the location on the table of the next component to be installed, as well as where on the product the installation was to take place. Information was provided through either the individuals’ HoloLens, aural communications or projection mapping. The task required the worker to wiring 8 different cables and to follow a few assembly conventions for the positioning of the cables. After the task, automated tests provide visual feedback on the assembled unit.

During the experiment we have recorded, with the consent of all workers, their interaction with the system to identify limitations and suggest better requirements for the system. Afterwards, we invited the participants to complete the SUS questionnaire, the results of which results are discussed below.

6 Observations

After the experiments, we calculated the average of the usability values of all participants to obtain the SUS score (Table 1). The mean SUS score is 75.71, the median is 77.50, the maximum is 87.50 and the minimum is 67.50. The overall usability scores fall in good usability score according to [2]. Figure 6 summarizes the results for each question of the questionnaire.

Table 1 Mean and standard deviation for each item in the questionnaire
Fig. 6
figure 6

Candlestick chart for the SUS scores of each question

In the results summarized in Fig. 6, we observed a positive sentiment towards the usefulness of the system (the average score of 4.57), which led to a willingness among users to continue using the system. The perception of a well-integrated system with their physical space and tasks goals rated as the highest user score (average score of 4.71). Participants also reported that information has easily accessible and consistent with their needs (average score of 1.43). We observed lower scores in features like simplicity (average score of 2.0), easy-to-use (average score of 4.0), and prior knowledge (average score of 2.0). We observed in a post-interview that participants were not familiar with the technology, e.g., cobots, and projected technology. The entire system was a novelty to the workers, nevertheless the overall reaction is that they felt very confident while interacting with the system (average score of 4.43).

An analysis of the responses collected with the cognitive walkthrough method revealed minor issues in the usability of the system that were not critical to the completion of the task (research question 1), e.g., menu labels that can be more intuitive. Participants succeeded in creating a digital manual for the schematics assigned as exercise. In a post-interview, participants described the overall tool as intuitive and flexible. With regard to research question 2, participants manifested difficulty to autonomously redefine the proposed interaction model created by the system for the different skills and disabilities. A few users suggested that it would be better to include interaction guidelines to support their personal views. Hence, we conclude that production managers would need additional help to understand how to customize interaction to different worker skills.

7 Discussion and limitations

In the assembly of medium-voltage switches, technicians must obtain a variety of assembly skills and knowledge to effectively work with many different products. Traditional methods to enhance the skills and abilities of the worker, such as on-the-job training (OJT), cannot fulfill the requirements that are expected in future trends for the assembly sector. A new human augmentation system could assist workers with both with learning and executing of assembly tasks themselves. The purpose of this work was to analyze the use of XR systems as a training and supervising medium for workers with disabilities in industrial settings. Modern shop floors are equipped with abundant numbers of sensors that have the potential not only to help train workers to perform a task, but to train them to perform alongside collaborative robots. Furthermore, the integration of XR with the IIoT has many valuable applications, such as the visualization of information collected and summarized by machines, the easy reprogramming of machines to prioritize given tasks, and the facilitation of the use of cobots.

Unfortunately, industrial environments present a severe challenge to XR applications. Usually, these environments embed multiple network architectures with different levels of visibility and security. Traditional applications might not be well-designed enough to communicate across disparate networks, thus, reducing their interaction potentials to a limited number of sensors and devices. Cloud-based systems can overcome this problem, but they fail to provide real-time interaction due to the latency between networks. Our system facilitates cross-network communication if communication bridges are deployed in each network. However, further tests will be required to validate the solution from a cybersecurity perspective, as this approach to computation is traditionally fraught with vulnerabilities.

We observed that a combination of Microsoft’s HoloLens, projectors, and mobile tablets can yield positive benefits to human safety and human-machine interaction. However, we found a number of issues when this technology is deployed in operational environments. Microsoft’s HoloLens field-of-view and battery hardly meet the requirements of continuous assembly production. Projectors operate on lamps that need more maintenance than flat screen monitors, and have to be replaced periodically. Moreover, participants with cognitive disabilities reported to have no previous experience with technology and required occasional interaction reminders.

These requirements are adversary to the conditions in assembly environments and require proper attention while designing the workspace. In future work, we need to investigate how to maintain the conversation active between the system and workers with cognitive disabilities, e.g., prompting information about the current action and reminding the worker how to interact with the system once the action is completed. Workers with experience often ignored digital instructions, but found the system useful for its alerts about potential assembly errors. Overall, the system has shown promising results in teaching and supervising workers with disabilities that otherwise would require human supervision. Yet there are safety concerns that must be further investigated, e.g., proximity to robots.

8 Conclusions and future work

Recent research has shown that about 60% of all occupations will have at least 30% of their activities automated by 2030 [5]. According to a study conducted by the McKinsey Global Institute (MGI) [38], between 400 million and 800 million individuals worldwide could be displaced by automation and need to find new jobs by 2030, including up to 1/3 of the workforce in the USA and Germany [38]. Consequentially, the current labor market will crash in a fraction of a generation, and job availabilities not requiring proficiency with advanced technology will dwindle [29]. However, for the near future, most jobs entail major requirements that cannot be handled by computers alone [44]. Hence, when we discuss how automation technology should be deployed in workplaces we must also emphasize the importance of augmentation rather than automation. Our contribution goes exactly in this direction. We describe how XR can deliver a live on-the-job skill refinement supporting system and yield greater benefits to human safety and product quality than traditional methods in industrial manufacturing.

Our system is designed to augment, in a visual manner, human workers with the information necessary to undertake specific, frequently changing assembly tasks. Specifically, the system supports workers in the learning process of assembly tasks one at a time and assists them in inspecting and performing them. Additionally, it introduces an immersive approach for the creation of new assembly manuals, greatly streamlining the teaching process. Workers’ disabilities were considered in the definition of how the XR system interacts and information is presented, e.g., slower versus faster animations for mentally impaired workers, personalized color palettes for each worker, simplified and magnified representations for workers with reduced sight, and use of hepatic sensors for extra feedback, etc. in an attempt to maximize worker’s satisfactions with the hybrid human-machine environment and, hence, bolster their productivity. By providing a reliable and efficient means to make existing workers better and safer at their jobs, our research not only increases their intrinsic value to manufacturing firms but also bolsters their satisfaction and fulfillment with their duties.

Future research will aim to better understand from a cognitive perspective how humans interact with virtual objects in a XR environment so as to be able to design a system which effectively and appropriately augments human performance through the detection of cognitive needs rather than through users’ deliberate actions. In these endeavors, it will be necessary to investigate virtual object interaction techniques in XR in terms of hand-based manipulation and precise selection mechanisms, and the extent to which they relate to existing literature. The central hypothesis is that relevant cognitive bottlenecks can be systematically identified and addressed by an intelligent cognitive architecture that receives input from XR hardware, biometric sensors, workplace data monitors, and intelligent machinery, and that this information can be leveraged to achieve effective human augmentation. The integrity of this hypothesis can be evaluated by examining behavioral reactions to XR systems using appropriate cognitive frameworks, including models exploring information processing via attentional control and working memory [1] and the postulated need and importance of humans to form mental models of task actions, context, and knowledge [30]. Using these frameworks as a guide to understand what users could be experiencing, we aim to document and capitalize on existing knowledge to extend the system to address specific deficits and shortcomings in human performance. The rationale behind conducting this future research is that a detailed knowledge of the possible modes of user interaction must be fully understood in order to allow for the design of the most effective XR infrastructures with our system. A systematic analysis of the incorporation of cognitive architectures into our XR system will allow intelligent machines to proactively assume new types of automation responsibilities, freeing cognitive resources in their human counterparts.