Keywords

1 Introduction

Augmented Reality (AR) is a novel technology with the ability to combine spatially mapped digital and real content in an interactive and multimodal interface [1]. As such AR can serve the role of a human–machine-interface (HMI) and is capable of enhancing the flexible skills of human workers in an industry 4.0 environment [2]. Offering a more intuitive approach to human robot collaboration, AR-based robot programming could be a potential alternative to conventional online and offline programming.

In literature, a variety of different systems realising AR-assisted robot programming ranging from path point modification [3], trajectory planning [4], collision detection [5], and human–machine collaboration [6] have been developed. Especially the benefits of natural gesture based programming methods have shown a higher efficiency as well as a good user satisfaction when compared to conventional programming methods [7]. However, a breakthrough of AR in the scope of industrial robot programming beyond the tier of a proof-of-concept solution has not been acquired yet.

Limited by the available stable accuracy, scenarios beyond pick-and-place application [8] are difficult to industrialise. With recent advances like the introduction of additional equipment like three-dimensionally tracked styluses [9] or external LIDAR sensors [10] performance, throughput and accuracy can be increased. Nevertheless, AR is not yet powerful enough to be a standalone alternative robot programming method in high tier automation.

Hence, we chose a different approach and do not view AR-assisted robot programming as an alternative but an enhancement of existing conventional robot programming methods. Especially in high tier automation industries where a combination of offline planning, programming, and an online commissioning and optimisation is characteristic, AR can smooth the transition between the two phases.

With AR we can on one hand assist the worker in the shopfloor environment with additional simulative abilities. On the other hand, deviations can be detected and directly corrected by comparing the digital model to the real workstation, thus, creating a more accurate digital representation. Based on this motivation we developed a system, that utilizes the visual and interactive abilities of AR to harness the features of an offline created robot programming system inside a factory environment to work with programs on a real robot [11]. While this approach is generally feasible, more detailed work in the assessment of accuracy, efficiency, and satisfaction, i.e. usability [12], is necessary.

In the following paragraphs, we will give a brief overview of our developed system, after which we will define the usability of a process and introduce different methods to measure the individual dimensions. In the end, we will present the results of two studies and discuss the usability of our system in the scope of AR-assisted robot programming.

2 MiReP—Mixed Reality Programming

We try to smooth the transition from the digital planning environment to the real workstation by utilising AR in the commissioning of an offline created robot program. We chose a modularised architecture as our basic design pattern to combine the functional range offline robot programming systems present with the multimodal interactive capabilities of AR. Adhering to the guidelines of the dependency inversion [13], we split our application in one core and six independent microservices.

Figure 1 (left) shows the general design of our application. Each microservice is implemented as an interchangeable plugin that adheres to a standardised interface defined by the core. Utilising only the standardised functions of the interface to orchestrate the different plugins, each system component is enclosed in an independent shell. This does not only increase testability and opens the possibility of decentralised cloud computed systems, but it creates an inherent extensibility. If, for example, a new plugin for a different AR device is developed, it can be introduced to the system without the necessity to update the entire infrastructure.

Fig. 1
figure 1

Schematic sketches of the general architecture (left) and the implementation (right)

The current implementation, schematically displayed in Fig. 1 (right), utilises a Microsoft HoloLens 2, a Microsoft Controller, and the simulation system Process Simulate (PS). By accessing the API of PS, we cannot only import, modify, simulate, and export programs, but directly export the geometry and position of CAD elements from the digital model as well. In combination with the model tracking capabilities of the Vuforia engine [14], we can detect CAD elements in the real world and register our system accordingly. After detection a reference geometry is displayed, and the user is prompted to confirm the correctness of the registration as shown in Fig. 2 left.

Fig. 2
figure 2

AR view of the MiReP system prior and post program optimisation

Hereafter, programs from accessible machines are automatically imported. The user then selects a program to work on in the AR path editor (Fig. 2 right). While the visualisation and interaction happen in the scope of the AR and input device, all calculations regarding reachability, movement simulation, or tool changes are done in the simulation system. When ready, the modified program is then re-exported to the associated robot.

3 Target Dimensions of Usability

The usability of a system offers a general evaluation of its suitability in a specific use case and is generally defined by three dimensions [12]:

  • Effectiveness: “How complete and accurate a user achieves the defined goal.”

  • Efficiency: “The used resources relative to the achieved accuracy.”

  • Satisfaction: “The perception and reaction of a user due to system.”

When reflected on the scope of the MiReP application, an HMI enabling a worker to analyse, modify, and evaluate robot programs of a real machine based on a digital simulation model, these generalised dimensions can be concretised.

The effectiveness directly relates to the quality of the modified process executed by the real robot. It consists of both the available functionality—e.g., the modification of the pose of a path point, the correct evaluation of a collision—as well as the quality of corrective modification—e.g., the accuracy of the modification. The efficiency is defined by the amount of time and effort a user invests into reaching the aspired result. The user satisfaction is a more complex parameter, as it embodies multiple interdependent and highly individual parameters. It consists of parameters like mental, physical, and temporal load, as well as frustration, effort, and the perceived performance.

4 Methodology

Measuring the usability of AR-assisted robot programming is not trivial. While parameters like end-to-end accuracy or duration can be measured absolutely, parameters like effort are coupled with the individual user, the scenario, as well as the current environment. However, while an absolute scale is difficult to realise, a relative comparison between two processes can be made. Hence, we will compare the usability of AR in the commissioning of an offline created robot program with the conventional Teach-In.

One standardised questionnaire applicable is the NASA Task Load Index (NASA-TLX) [15]. The questionnaire shown in Fig. 3 consists of six questions each targeting a different category that together offer an assessment of the global workload users perceive during processing of their task.

Fig. 3
figure 3

NASA-TLX questionnaire [15]

Based on the results of the different categories, a global workload index with a range of 0 to 100 can be calculated. The lower the value, the better the result.

Especially during prolonged work, the ergonomics of a task are an important element in a healthy work environment. As HMIs like the HoloLens increase the strain on the neck of the user due to their weight, a detrimental effect on the posture is expected. Hence, another more specific analysis of working posture is necessary in addition to the NASA-TLX, as bad posture correlates with physical demand.

The Ovako Working Posture Assessment System (OWAS) offers an objective method to analyse the posture of a human over a prolonged period [16]. A score between 1 and 4 is calculated depending on the relative position of back, arms, legs, and the handled load. The scoreboard is depicted in Fig. 4.

Fig. 4
figure 4

Ovako Working posture Assessment System (OWAS) [16]

As an example, Fig. 5 shows a user in two different working postures. Regarding the scoreboard the left user has a bent back (2), both arms are below shoulder level (1), he squats (3) and handles a load below 10 kg (1), scoring a value of 2 which implies that corrective actions to improve the working posture are required in the near future.

Fig. 5
figure 5

User during programming with AR (left) and Teach-In (right)

As each OWAS analysis does only represent one moment during task execution it has to be done over a prolonged period of time. An overall grade between 100 and 400 is calculated depending on the percentage of time the user stays in a bad posture.

As absolute values accuracy and working time can be measured in a simple experimental setup, as depicted in Fig. 6.

Fig. 6
figure 6

Sketch of original and aspired contour in simulation (left); view in AR setup (right)

The user modifies an erroneous program (red) by adding and repositioning path points until a defined contour (green) is acquired. The thereby created program is then exported to a robot, which, armed with a pen, then draws the contour on paper. An assessment of the accuracy and working time can be made by measuring the offset of each path point as well as the time the user took to modify the program.

An additional method to evaluate the usability of a system is the System Usability Scale (SUS) [17]. Utilising ten standardised questions, an absolute score between 0 and 100 can be calculated. The according question are displayed in Table 1.

Table 1 System Usability Scale Questionnaire

Each question is to be answered with one of the following statements: “Strongly Agree”, “Agree”, “Neutral”, “Disagree” or “Strongly Disagree”. From that, a global score can be calculated. Generally, a value exceeding 70 indicates a good usability.

5 Mock-Up and Conduction of the Experiment

We conducted two independent studies with a total of 31 participants. Each time a user commissioned an offline created robot program with both the AR-assisted as well as the Teach-In method. In the first study, we used the OWAS to assess the working posture of ten users with an age between 19 and 54.

Figure 7 shows the original program in the simulation system as well as the displaced program as viewed in AR. The users were split in two separate groups. After a brief 10-min introduction to either the MiReP system or Teach-In programming, the user commissioned for 30-min. The same procedure was repeated with the other programming method after a short break. The order changed depending on the group affiliation. Each user was recorded with a camera during execution. The average risk index of the MiReP system is 109/400, Teach-In programming has a value of 104/400.

Fig. 7
figure 7

Program in simulation system (left); Displaced program in reality (right)

Similar to the previous experiment, two groups of users with a total of 21 participants with an age between 17 and 35 commissioned an erroneous offline created robot program in different orders. A sheet of paper indicated the aspired contour as a guideline for the optimisation.

Figure 8 shows a user during the two different tasks. In preparation to the task, each user was given 10 to 20 min of guided preparation with each of the two systems. During training, 5 of the handled pens were broken due to programming errors while controlling the robot with the Teach-In method.

Fig. 8
figure 8

User during programming with Teach-In (left) and AR (right)

During the experiment, no pens were broken. However, some user needed additional assistance while using the Teach-In programming due to operating issues.

In the end, any created program with either method was valid and runnable. The calculated results regarding accuracy and working time are shown in Fig. 9.

Fig. 9
figure 9

Results of accuracy analysis (left) and efficiency analysis (right)

Users filled out a NASA-TLX questionnaire immediately after completing a programming task. The averaged results are displayed in Fig. 10.

Fig. 10
figure 10

Results of the NASA-TLX

The average global task load of AR is 29 with a standard deviation of 13.4, whereas the Teach-In method averaged at 34.9 with a standard deviation of 13.4.

At the end of the experiment, each user filled out a SUS questionnaire. The averaged result was 75 with a standard deviation of 12.

6 Discussion

Both calculated OWAS scores are acceptable. Even though MiReP scored slightly worse (109/400) we assume, that using AR for a 30-min commissioning does not significantly worsen the posture of a user when compared to the Teach-In method.

As expected, the result of the accuracy assessment shows that the average error of 9.7 mm with a standard deviation of 6 when using AR is significantly worse than the average error using Teach-In which is 1 mm with a standard deviation of 0.7 mm.

However, as depicted in Fig. 11 the potential influence of a systematic error can be detected. After calculating the systematic error and adjusting the result an average error of 2.9 mm is calculated confirming the existence of a systematic effect.

Fig. 11
figure 11

Comparison of best (avg. error 2.8 mm) to worst (avg. error 25.4 mm) result with AR

As users register the device initially from an individually chosen angle and position, especially in consideration that AR glasses have known limitations regarding depth perception [18], we assume this to play a major part in the systematic error. However, as the best user achieved a fairly high level of accuracy (2.8 mm), this also shows that a proper setup can result in higher achieved program quality. In addition to accuracy, Fig. 9 also shows a 32% reduction in programming time when compared to Teach-In. Using a two-sample t-test a significant difference is deduced.

The results of the NASA-TLX show that, even though not significant, the global task load of AR is slightly better than Teach-In programming. However, it is noticeable that the physical load perceived when working with AR is lower than when using the Teach-In. While this is contradictory to the results of the OWAS, it can be partly explained by the shorter working time and the lower effort needed to reach the aspired goal. Moreover, the grading of the perceived performance correlates with the measured accuracy.

Even though the sample size of 21 is small, the results of the SUS indicate a generally good usability of the presented AR-assisted robot programming.

7 Conclusion and Outlook

The presented studies show that when utilising AR-assisted robot programming the commissioning of an offline created robot program is more efficient but less accurate when compared to the Teach-In robot programming. It was shown that the initial registration has a major effect on the overall error, hence, the introduction of additional visual assistance and continuous feedback to the user could improve the performance of AR-assisted robot programming. The OWAS showed that during a task duration of 30 min the use of an AR device does not negatively impact the posture of a user. This is confirmed by the NASA-TLX that shows a slightly better global workload than in Teach-In programming. Complemented by the results of the SUS, the presented AR-assisted robot programming in the commissioning of offline created robot programs has a generally good usability.

The results show that the presented AR-assisted robot programming is currently not accurate enough to fully substitute the Teach-In programming in commissioning of offline created robot programs. However, due to its intuitiveness it is plausible to use AR when either the accuracy suffices the depicted use case or if AR is utilised as a transition to reduce the necessary amount of Teach-In optimisation to reduce the overall duration of commissioning.