1 Introduction

With the increasing requirements on the power density of industrial products, the inner structure of products is becoming more and more complicated, which leads to the assembly spaces for workers becoming more and more narrow. The narrow assembly spaces could cause many assembly issues related to human factors, such as low reachability, bad visibility or poor ergonomics, which have negative impacts on productivity, product quality, production cost and worker safety [4, 6, 25]. Thus, it is of vital importance to assess the assemblability of products with narrow assembly spaces in early design phases, and address the assembly issues related with human factors as early as possible. In traditional methods, engineers use computer-aided design (CAD) software with digital human model (DHM) to perform assemblability evaluation in virtual environments. In the CAD software, engineers can observe the assembly in a 3rd person perspective (3PP) camera with free positions and directions, enabling the engineers to comprehensively evaluate the design. However, the efficiency of CAD software in assembly assessment is limited by its low usability. For example, the work of setting large number of key frames of DHM motion is tedious [16, 25]. Moreover, the assembly simulation in CAD software may not actually reflect the human factor issues in narrow assembly spaces, because the users cannot subjectively perceive the assembly spaces through dragging the body parts of DHM using mouse and keyboard [25].

Virtual reality (VR) has attracted extensive attentions in recent years due to its high level of immersion and interactivity, and has been widely applied in industrial applications such as assembly simulation [12], work space assessment [32], training [1] and education [40, 41]. VR can efficiently support assemblability assessment by simulating the human factors using the real-time motion capture (MoCap) and stereo display, which is more natural and immersive than the traditional CAD software [11, 21, 24, 35, 36, 45]. In our review, most of the current researches used first-person perspective (1PP) to visualize the virtual environments of the assembly simulation in VR. The 1PP interface can present the virtual environments to users in a natural and intuitive way since it accords with our habits, leading to more natural interaction behaviors [5, 10, 15]. However, due to the lack of the spatial information outside of the field of view (FOV), the 1PP interface could cause problems in assembly simulation and assessment in the narrow spaces. The problems include: (1) users cannot evaluate the overall assembly status while performing the posture simulations since they cannot observe the assembly scene from a global view; (2) posture simulation could be incorrect due to the unawareness of the penetration between the body parts (especially for head, back, and legs) and the surrounding objects; (3) users cannot see their operations and the assembly status in the occluded spaces. These problems seriously limit the reliability of the assemblability assessment in VR. Some researches suggested the solutions which separates the design/simulation and evaluation/assessment into two processes through record & replay or operator & supervisor methods [12, 32]. This could partially solve the problems of 1PP interface. However, the solutions may not be efficient because the solutions require collaborations of two people or a replay phase, which is labor-consuming and time-consuming.

To solve the problems mentioned above, a straightforward thought is to integrate the 1PP and 3PP to reap the advantages of the two perspectives. Technologies of multiple perspectives integration have been extensively studied to improve user’s navigation [29], collaboration [34], and manipulation performance [23]. World-in-miniature (WIM), which can integrate the 1PP and 3PP simultaneously into a single interface with considerable interaction performance [14], is one of the most sound integration technologies. However, there was no study found in applying WIM in the VR-aided assembly assessment. In this paper, we explored the application of WIM in VR-aided assembly assessment and test its performance on improving user’s ability of assembly simulation and assessment in narrow assembly spaces, and proposed the multi-perspective interface (MPI) based on the handheld WIM. To achieve the goals, we conducted two studies. The first study tested user’s general interaction performance in the MPI to reveal its characteristics, and provided principal basis for the second study. In the second study, we tested the performance of MPI users during assembly simulation and assessment in narrow spaces. This study aimed to verify our research goals. Combining the results of the two studies, we made a deep insight into the application of the MPI in VR-aided assemblability assessment in narrow assembly spaces.

The organization of this paper is presented as follows. Section 2 reviews the applications of VR in industry and the related researches about multiple perspective integration. In Section 3, the design and implementation of the proposed MPI are introduced. Section 4 presents the user study which tested the interaction performance of MPI using a general interaction task. In Section 5, a prototype of a VR-aided assemblability assessment system with the proposed MPI is described, and the test of the MPI in assembly assessment task is presented. Section 6 summarizes the findings and suggests the future directions.

2 Related work

2.1 The application of VR in industry

VR technology has been successfully applied in industry. Performing assembly simulation in VR has proved to be more useful and efficient than in the traditional way which relies on expert knowledge or CAD software [4, 22, 25]. Moreover, VR can support both physical and cognitive analyses through interactive simulations, and improve the analysis accuracy by applying physical evaluation on real-based human manikins through MoCap devices [25]. Studies have shown that, combined with MoCap devices and immersive display interface, VR-aided simulation is efficiently to support ergonomics evaluation [11], design review [7, 12, 31], assemblability and maintainability assessment [13, 16, 26, 32], training [1] and education [40, 41], etc.

The 1PP interface is the most popular visual interface in the current VR applications. Different from the 3PP interface of traditional CAD software, the 1PP interface imitates human’s habit of observing the real world, making the user’s interaction with the virtual world more immersive and engaging [1, 25, 41]. Some researchers suggested other solutions which took advantages of the 3PP interface. Dimitropoulos et al. [12] proposed a framework enabling the design of virtual environments for assembly operation simulation. In their framework, the layout of the virtual simulation environment is firstly designed on a 2-D plane from a bird view (3PP). Then, the 3D virtual environment is generated according to the layout, and the user simulates the assembly operations in a 1PP view. Ottogalli et al. [32] suggested a supervisor-operator method to simulate and evaluate the assembly in a human-robot assembly line. In their system, the motion data of the operator is firstly captured using the MoCap system; then, an external supervisor checks the posture and movements done during the simulation. However, these solutions are kind of labor-consuming and time-consuming since they require the collaboration of two people or a replay phase to complete the evaluation.

2.2 1PP and 3PP

1PP is commonly used in VR applications, enabling more accurate interaction and a higher sense of embodiment in immersive virtual environments. Salamin et al. [37] found that 1PP is suited for fine operation with the hands, especially when we need to look down or just in front of us for the manipulations with immobile objects. Debarba [10] and Gorisse [15] found that 1PP allows more accurate interaction and induces a stronger sense of embodiment toward a virtual body. Bhandari et al. [5] indicated that 1PP provides better spatial perception and performance in dynamic tasks.

3PP is usually applied in desktop applications. Researches of 3PP point out the advantages of 3PP in spatial awareness and navigation performance in VR. Salamin [37] and Gorisse [15] found that 3PP provides a wider view of the periphery of the virtual body, increasing user’s spatial awareness. Cmentowski et al. [8] found that navigating in a bird-eye 3PP can reduce user’s disorientation after relocation.

There are also drawbacks contained in both 1PP and 3PP. The most obvious drawback of 1PP is that users cannot obtain the information out of the FOV of 1PP. The drawbacks of 3PP are the inconsistency between the viewpoint of 3PP and human’s natural perspective, resulting in the reduction of the sense of self-location [15] and interaction performance [2, 15]. Alonso et al. [2] studied the 3PP camera positions on interaction difficulty and found that the interaction performance increases with the 3PP camera getting closer to users’ head. Medeiros [28] argued that 1PP is more suitable for navigation tasks in a space with obstacles. They found that users can complete the navigation task with less time and higher precision in 1PP than in 3PP, which contradicts Gorisse’s findings. Medeiros interpreted this contradiction by the fact that their study focused on the relations between users and the virtual environments which are highly influenced by how a user makes distance judgement and affects the feeling of spatial awareness.

2.3 World-in-miniature

World-in-miniature is one of the commonly used methods to introduce additional perspectives in the original view. WIM is defined as a miniature copy of virtual environment, which can be applied to search, manipulation, selection and navigation tasks in VR. WIM was first proposed by Stoakley et al. [39], who built a direct relationship of life-size objects in virtual world and miniature objects in WIM. Users can build new viewpoints more swiftly and naturally by rotating WIM in their hands, which has proved to be advantageous in reducing users’ cognitive burden. Pausch et al. [33] indicated that WIM is similar to a cube map, which can provide users with a manipulable global view in original 1PP. Andujar et al. [3, 43] referred that users can observe and select occluded objects in original 1PP by rotating WIM, which solves the problem of relying on navigation to select occluded objects to some extent. Evin et al. [14] used a WIM model orbiting around the user to reduce the cybersickness in viewpoint movement, while maintaining the natural interaction.

In summary, the previous researches indicated that the integration of multiple perspectives is able to improve the interaction performance in virtual environments. However, the effects of MPI on assemblability assessment in narrow assembly spaces are still under investigation, which is the purpose of this study.

3 Design of MPI

3.1 Design consideration

In order to provide the most information to users, MPI should be able to display the views of 1PP and 3PP simultaneously. A straightforward solution is using one perspective as the primary view of the interface, and integrating the other perspective into the interface as an assisted view. The primary view is the major view that users see in the head-mounted display (HMD) for most of the time, while the assisted view is observed by users through a designed user interface. It is crucial to carefully design the interface to ensure that users have sufficient freedom to observe the assembly scene through the interface, while interacting with the interface without overly interfering with the interaction with the virtual world.

To meet the above considerations, we propose to use 1PP as the primary view of the MPI, and use a handheld WIM as the 3PP assisted view, which reaps the following benefits:

  • With the 1PP primary view, users can interact with virtual world naturally and precisely [5, 15], while suffering less cybersickness [14]. This is beneficial to ensure the accuracy of assembly operation simulation in the narrow spaces.

  • The WIM displays the miniature world in 3D, which enhances the expression of the spatial information, enabling users to make accurate assessment.

  • The handheld WIM allows users to control the orientation of the WIM model by simply rotating the hands, which is in line with people’s intuition.

3.2 Implementation

The implementation of our work was based on the off-the-shelf VR platform HTC Vive [18] (shown in Fig. 1), since it provides extensive VR interaction functions and is easy to develop. The visualization of the VR environment is achieved using the Vive HMD. Two Vive handheld controllers and five Vive trackers [19] as well as the HMD were used as the markers of the full-body MoCap, and the two controllers were also the input device of the interaction with the virtual world and the control of the WIM model.

Fig. 1
figure 1

Device setup for VR display and full-body motion capture

We developed the MPI and the virtual environments in Unity 3D game engine [42], which provides the basic tools for virtual world rendering, object grasping [9], inverse kinematics, avatar animation [30] and physical simulation [44]. Figure 2 shows the composition of the proposed MPI. The details of the implementation of the MPI are presented as follows

Fig. 2
figure 2

The composition of the proposed MPI based on handheld WIM. To represents the relative offset between the WIM model and the hand joint of the avatar

1PP primary view

The 1PP primary view of the MPI is implemented using the Unity engine’s default VR camera with a FOV of 110 degrees and moves with the tracked HMD.

WIM modeling

The implementation of the WIM refers to [33, 43]. In the virtual environment, we created a WIM model which consists of the miniatures of all objects in the real-sized world, including the virtual environments and the avatar. All the miniatures have the same model hierarchy as the original objects, and were scaled down to 0.2 of the original size. The WIM model scaling is allowed to be adjusted for ease of observation. In every rendering update, the position, rotation, and scale of the original object are reproduced to the corresponding miniature, that is, if an object is translated, rotated, or scaled in the real-sized world, the corresponding miniature will be applied the same transformation in the miniature world, but with additional WIM model scaling applied. The rendering effects, such as color changing or deformation, are also reproduced to the miniature to ensure the consistency.

The WIM model is set as the child transform of the palm joint of the avatar which represents the user in the virtual world. The WIM model is located at the 0.2 m above the palm joint, and the upward direction of the WIM model is aligned with palm normal vector (pointing out of the palm’s back surface). The relative movement between the WIM model and the palm joint is fixed during the rendering update. Thus, the position of the WIM model follows the Eq. (1):

$$ T_{w} = T_{p} + R_{p} R_{r} T_{o} $$
(1)

where Tw is the translation vector of the WIM model in the world coordinate system, Tp and Rp are the translation vector and rotation matrix of the palm joint in the world coordinate system, respectively, Rr is the relative rotation matrix between the WIM model and the palm joint, and To is the offset of the WIM model relative to the palm joint. Here, the To is set as (0,0.2,0).

WIM control

We implemented several control functions to reduce the interference of the WIM control on user’s interaction with the virtual world. The control functions were implemented based on the application programming interface (API) provided by SteamVR [9].

  • Switching. Users can switch which hand that the WIM model aligns with to prevent the WIM from interfering with the operating hand. By pressing the panel button on a controller, an event callback is triggered where the parent transform of the WIM model is changed to the hand joint corresponding to the pressed controller. Also, the parameters in Eq. (1) are updated accordingly.

  • Rotate. Users can redirect the WIM when it is in an inconvenient observing direction. The function is performed by touching the different parts of the controller panel. When touching the left half of the controller panel, the WIM model rotates clockwise relative to the normal vector of the palm joint, and vise versa. The function is realized by adjusting the y element of the Euler angle of the Rr in Eq. (1) according to the touching time.

  • Toggle. Users can toggle the visibility of the WIM model if the WIM model occludes the target place. To achieve the function, we attached a callback to the pressing event of the menu button on the left-hand controller. The callback will toggle the visibility by enabling and disabling the root GameObjectFootnote 1 of the WIM model.

In addition, we defined the menu button on the right-hand controller as a lock button to avoid Midas touch. By pressing the button, the WIM goes into an editable mode where all the control functions are available. By pressing the lock button again, the WIM exits the editable mode, and all the control actions are ignored.

4 Study 1: General interaction study

To investigate user’s interaction performance in MPI, we conducted a within-subject user study (N = 22) comparing MPI with a simple 1PP interface. Considering the purpose is to improve the assembly simulation and assemblability assessment in VR, we mainly care about the hand-manipulation performance and the spatial awareness of users. The former involves the evaluation of assembly complexity, operation time, etc. The latter relates to the evaluation of the ergonomics, reachability, and visibility, etc. Moreover, we are concerned about the posture simulation errors caused by the penetration between body parts and virtual objects, which is crucial for the assembly simulation accuracy in narrow spaces. Hence, the experiment was designed to confirm the following hypotheses:

  1. H1

    : Users have significantly better spatial awareness in MPI than in simple 1PP interface.

  2. H2

    : There are significantly less collisions between user’s body parts and virtual objects in MPI than in simple 1PP interface.

  3. H3

    : The hand-manipulation performance in MPI is not significantly different with that in simple 1PP interface.

According to [14, 38], the proposed MPI is supposed to improve the interaction performance since it combines the advantages of 1PP and 3PP. However, it was still necessary to test hypotheses H1 and H2 because no previous study had tested the handheld WIM-based MPI in narrow spaces. Regarding the H3, there is no evidence showing that the assisted view in MPI would not affect the hand-manipulation performance in the 1PP primary view.

4.1 Experiment design

In order to test user’s interaction behavior in the narrow spaces, we designed an interaction task in which users need to transport an object through obstacles according to the previous works [28, 38], as it is beneficial to control the experimental variables and is easy for testers to understand. The layout of the task scenario is shown in Fig. 3a. The task space is separated into two zones (A and B) by a brick wall. A dummy-like avatar is used as subjects’ representation in the virtual environment. Subjects can control the avatar through the MoCap device. The brick wall is wide enough so that the subjects cannot bypass the wall to the other side. An opening in the brick wall provides an entrance between the two zones. Two tables are placed in both of the zones respectively, and the tables are randomly relocated in each trial on a circle centered around the opening with a radius of 1.5 m. A trophy is placed on the table in zone B, and a starting ball, which subjects could interact with to start the trials, floats on the table in zone A. To test different limitations of the obstacles on navigation [28], we designed eight sizes of the opening shown in Table 1.

Fig. 3
figure 3

a Experiment scenario and b experiment task

Table 1 The settings of the opening size. The size of the opening is determined by three dimensions: width (W), height of the upper boundary (UB) and height of the lower boundary (LB)

Figure 3b briefly introduces the procedures of experiment task. The user’s views of the MPI in the experiment task are shown in Fig. 4. The subject starts one experimental trial by interacting with the starting ball in zone A. Then, subjects need to move to zone B through the opening (see Fig. 4a), and get the trophy placed on the table in zone B (see Fig. 4b). Next, subjects need to go through the opening again to transport the trophy back to zone A. Finally, subjects complete the task by placing the trophy on the table in zone A. When the avatar collides with the wall, the color of the wall will change to red as a visual warning (see Fig. 4a). The subject should try to avoid the avatar colliding with the wall during the task.

Fig. 4
figure 4

User’s views of the MPI in the general interaction task: a subject passed through the opening; b subject grasped the trophy

4.2 Subjects

We invited 22 college students from local campus (9 females, aged from 21 to 26, M = 23.59, SD = 1.44) to participate in the experiment through posters and social media. We asked them to rate their familiarity with VR using a 5-point Likert scale from 1 to 5 (1: Never used VR before; 5: Senior Player), 5 subjects rated their experience as 1, 11 subjects rated 2 (using VR once or twice), 2 subjects rated 3 (using VR many times), 3 subjects rated 4 (using VR frequently), and one subject rated 5. All subjects received 200 yuans as the salary for participating in the experiment.

4.3 Measures

We used the time of users passing through the opening (PT), and the collision time ratio (CTR) as the quantitative measures of the spatial awareness and collision awareness, respectively [28]. When the avatar enters the opening, the PT starts to count and accumulates until the avatar fully exits the opening. At the same time, if the avatar collides with the wall, the collision time (CT) accumulates with the PT. As one trial is completed, the CTR is calculated as the ratio of the accumulated CT to the accumulated PT in this trial.

We define the manipulation time (MT) to quantitatively measure the hand-manipulation performance [15, 37]. The MT is defined as the duration that subjects used to get the trophy and place the trophy.

To obtain the subjective feedback, we conducted a post-test questionnaire consisting of three questions regarding subjects’ ratings on their spatial awareness, collision awareness, and manipulation performance, as listed in the following. The questions are 7-point Likert scale with 1 indicating strongly disagree and 7 indicating strongly agree.

  1. Q1:

    I can accurately sense the distance to the virtual objects.

  2. Q2:

    I can easily avoid the collision with the wall.

  3. Q3:

    I can easily grasp and place the trophy.

We collected subjects’ preference and comments on every condition after the experiment.

4.4 Experiment procedures

The experiment consists of the following steps. Firstly, the subject completed pre-experiment questions to raise their profiles regarding age, experience with VR, etc. Then, we explained the experiment task and the use of MPI to subjects. After that, we helped the subject to wear the VR devices, and calibrated the systems. To familiarize the subject with the procedures, we let the subject to perform the task in a training scene, where the subject was allowed to freely explore the virtual environment and practice the use of MPI. The time for training was limited to 5 min. After the training, the subject performed the formal test where the quantitative measurements were collected for every experiment condition. The order of the conditions was counter-balanced in subjects. In the test of each condition, the subject performed the task with different sizes of the opening and randomized positions of the tables for eight trials. The order of the opening size was randomized for each condition. After the subject completed the test of one condition, the post-test questionnaire was conducted for collecting the subjective ratings, and the subject took a break of 1 min. As the subject completed all tests, the post-experiment questionnaire was conducted for collecting the preference and open-ended comments.

4.5 Results

We report on the experiment results of the quantitative measurements and subjective ratings against all conditions. The data analysis was performed in OriginPro 2021. The mean difference was significant at the .05 level. The normality of measurements was tested before analysis using the Shapiro-Wilk test. If the measurements are normally distributed, a paired-sample T-test was conducted for post hoc analysis. Otherwise, a paired-sample Wilcoxon signed-rank test was conducted for testing the difference. Bonferroni correction was automatically applied for multiple comparisons.

Quantitative measurement

We reported the analysis results of PT, CTR, and MT. CT is not involved in our analysis since it is interrelated with PT, but we list it in Table 2 for reference. We took the average of the measurements of the eight trials in one condition as the measurement result.

Table 2 Mean and standard deviation (in parentheses) of the quantitative measurements
Passing-through time.:

Statistical results of PT are shown in Fig. 5a and Table 2. Shapiro-Wilks shows that the measurements of PT are normally distributed. Thus, we used a paired-sample t-test for post hoc analysis. Results show that the PT in 1PP is significantly lower than that in MPI (t = 3.92, p < .001).

Collision time ratio.:

Statistical results of CTR are shown in Fig. 5b and Table 2. The results of the Shapiro-Wilk test determined that the CTR were normally distributed for both conditions. Thus, we used a paired-sample t-test for post hoc analysis. Results show that the CTR in MPI is not significantly different from that in 1PP (t = − 1.78, p = .132).

Manipulation time.:

Statistical results of MT are shown in Fig. 5c and Table 2. The Shapiro-Wilk test indicated that the MT were not normally distributed. Thus, we used a Wilcoxon signed-rank to test the significance. Results reveal that there were no significant differences found in MT across MPI and 1PP conditions (Z = 2.07, p = .053).

Fig. 5
figure 5

Statistical results of the quantitative measurements of the experiment. a Passing-through time (PT). b Collision time ratio (CTR). c Manipulation time (MT). Boxes represent the inter-quartile range, whiskers represent the 95% confidence intervals, cube dots represent the means, lines in the boxes represent the medians, diamond dots represents the outliers

Subjective rating

Subjective ratings of the post-test questionnaire are summarized in Table 3. Shapiro-Wilk’s test showed that the subjects’ ratings in all questions were not normally distributed. Thus, we conducted a paired-sample Wilcoxon signed-rank test with Bonferroni correction to test the difference. The results of analysis were presented below.

Table 3 Median and inter-quartile range (in parentheses) of the subjective ratings (1—strongly disagree, 7—strongly agree)
Spatial awareness.:

The question Q1 tests users’ subjective ratings on their spatial awareness in each condition. Results of the Wilcoxon rank test show that the ratings are significantly higher in MPI than in 1PP (Z = 2.50, p = .004).

Collision awareness.:

The question Q2 tests users’ subjective ratings on their collision awareness in each condition. Results of the Wilcoxon rank test show that the ratings are significantly higher in MPI than in 1PP (Z = 3.03, p < .001).

Manipulation performance.:

The question Q3 tests users’ subjective ratings on their operation performance in each condition. Results of the Wilcoxon test show no significant difference in the ratings of Q3 between MPI and 1PP conditions (Z = .8, p = .625).

Preference and comments

At the end of the experiment, we collected subjects’ preference and open-ended comments in a semi-structured interview. Twenty-one subjects expressed they preferred MPI, while one subject (S4) preferred 1PP. In the open-ended comments for MPI, 10 subjects expressed positive feedback, 4 subjects expressed negative feedback, 7 subjects are neutral about it, and one subject gave up making comments. The most reoccurring positive feedback can be summarized as the intuitiveness and the flexible operation. Subjects that are positive to MPI thought that “WIM displayed information in 3D, which made the assisted view more intuitive” (S8, S11, S17). Subjects can freely adjust the viewpoint of the assisted view by rotating their waist, making it possible to “simultaneously observe the situation in both sides of the wall”; thus, WIM is helpful to “control the body movement and complete the task” (S7, S11, S12, S15, S17, S18, S19, S22). However, subjects that are negative to MPI thought that WIM brought “too much information” (S1, S14, S22), and “the operation of adjusting WIM could lead to accidental collisions” (S3, S16). Subjects that were neutral to MPI thought that “MPI is novel and intuitive, but I rarely used it in the task because I can complete the task only relying on 1PP” (S4, S13, S18, S20).

4.6 Discussion

The results of PT reject the H1, as the PT in MPI are significantly longer than that in 1PP. We assume the possible reason could be that the richer spatial information provided by MPI caused the user to pass the opening more carefully, thus extending the PT under MPI condition. This lines with the subjective feedback in Q1 which indicates that subjects reported higher ratings on their spatial awareness in MPI than in 1PP.

The results of CTR reject the H2, as no significant difference is found in CTR between MPI and 1PP conditions. This raises a contradiction as the more careful movements should have led to the lower CTR. Moreover, the subjective feedback in Q2 reveals a higher ratings on collision awareness in MPI than in 1PP. We assume the possible reason of the contradiction could be that our experimental task may reach a ceiling effect of the full-body visuomotor control as compared to the influence of the interface [10]. Due to that the collision avoidance in our experiment needs fine motor control, users cannot perfectly avoid the collision only with the visual clues. Another possible reason could be the accidental collisions caused by MPI adjustment increased the CTR under the MPI condition, as stated in the open-ended comments of S3 and S16.

The results of MT support the H3, showing that the MPI has negligible effects on user’s manipulation performance comparing to 1PP. This lines up with the subjective ratings in Q3. The open-ended comments indicate that the handheld WIM has advantages in the flexibility of controlling the assisted view as stated in the previous works [3, 33, 43]. However, several drawbacks were exposed in the open-ended comments, such as controlling the WIM could raise accidental collision, and the over-rich information. This indicates that the controlling method of the assisted view needs further improvements for better usability.

5 Study 2: Assemblability assessment task

In the second study, the MPI was tested using a case in which the position of a pair of bolt and nut on fuselage belly is needed to be assessed. The case was chosen because it reflects the work scenario of workers assembling in a narrow space and the impact of the narrow space on assemblability. The assessment results, usability and workload of the MPI were compared with the simple 1PP interface through a between-subjects pilot study.

5.1 Case introduction

We tested MPI in an imagined assembly task in which a pair of bolt and nut needs to be tightened on the fuselage belly. In the task, the operator uses a wrench to fasten the bolt and nut through a manipulation window in the fuselage plate, as shown in Fig. 6. The assemblability is related to the height of the fuselage plate (H), the distance of the bolt-nut to the manipulation window (D), and the interference of the strengthening rib. The configuration of different levels of assemblability is listed in Table 4.

Fig. 6
figure 6

Assembly case of tightening a pair of bolt and nut on fuselage belly

Table 4 Configurations of the assemblability level

As shown in the table, there were three settings of the plate height, 1.7 m, 1.4 m and 1.1 m respectively, corresponding to the operator working at the height of their head, chest and abdomen, which have different impacts on the visibility, reachability and comfort of the assembly task. The distance between the bolt and nut to the operating window was set in two ways, which are 0.9 m and 0.5 m, corresponding to the operator tightening the bolt and nut at the end of hand reaching or at a nearing place. In addition, when the distance was set to 0.9 m, the bolt and nut located at the far side of the strengthening rib, which could hamper the tightening operations during the task.

5.2 VR-aided assemblability assessment system

We developed a prototype of the VR-aided assemblability assessment system for performing the assembly assessment. Figure 7 shows the composition of the system. The system consists of three modules, namely the introduction panel, rating panel, and simulation zone. Figure 8 shows the user’s views in the system with the MPI. The introduction panel introduced the usage of the system to the user. In the simulation zone, the user was allowed to simulate the assembly task using full-body MoCap and controller-based hand interaction to gain an intuitive impression on the assemblability. After the simulation, the rating panel showed up and allowed the user to assess the assemblability on a user-interaction panel through the line-pointing interaction method. The rating panel was hidden when the user performed the assembly simulation to avoid distraction.

Fig. 7
figure 7

Prototype of the VR-aided assemblability assessment system used in the experiment

In order to provide the basic interaction feedback during the simulation, there were several features developed. Firstly, the color of the virtual parts was changed when the avatar or the virtual tools collided with the parts, as shown in Fig. 8a and c. In addition, when the virtual tools reached the bolt and nut, an audio feedback (sound of metal striking) was provided to indicate users about the status of the reachability.

Fig. 8
figure 8

User’s views of the MPI in the assembly assessment task: a tighten the bolt, b tighten the nut, c tighten the bolt and nut, d rating the assemblability

5.3 Experiment design

The experiment design is a between-subject pilot study. We invited six college students (3 females) who majored in mechanical engineering with averaged age (± SD) of 24.5 (± 3.39) years to participate in the experiment. All subjects have experiences of using VR devices. The subjects were randomly divided into two experimental groups, which were the MPI group, where the user interface of the VR-aided assessment system is MPI, and the 1PP group, where the user interface is a simple 1PP interface. Subjects in both groups were encouraged to use the provided interface to perform the assemblability assessment. We used visibility, reachability and comfort [16] as the assemblability metrics in the experiment. The metrics were rated using a 5-Likert scale, with 1 being very poor and 5 being very good. After the experiment, the subjects were asked to rate the usability and workload of the interface based on their experience.

The experiment procedure is presented as follows. Subject was welcomed when they arrived. Then, according to the group that the subject belonged to, the system usage and the experimental task were simply introduced. Then, the subject put on the VR devices with the help of the experimenters. At the beginning of the experiment, the virtual environment was scaled according to the height of the subject in order to eliminate the bias caused by the height difference between subjects. After that, the subjects were allowed to familiarize themselves with the system in a training scene for 5 min. After the training, the subjects were asked to perform the formal experimental tasks in which they need to evaluate six assembly tasks of different levels of assemblability. For each assembly task, the subjects firstly simulated the task in the simulation zone of the system. Then, the subjects used the rating panel to assess the visibility, reachability, and comfort of tightening bolt (B), tightening nut (N), and tightening bolt and nut (BN) successively. The order of the six assembly tasks was randomized between the subjects in each condition.

5.4 Results

Assemblability assessment

We summarize the assemblability assessment results made in the two experimental groups (MPI and 1PP) in Fig. 9. We invited two experts who are experienced in CAM software to provide expert assessment results using Siemens Jack v8.2. Since the experts were unable to subjectively evaluate the comfort in the desktop software, we used the Rapid Upper Limb Assessment (RULA) [27] score evaluated by the software as the expert assessment of comfort. The RULA scores (7 points) were scaled to the 5-point Likert scores in the same manner as other assessment metrics.

Fig. 9
figure 9

Assemblability assessment results regarding the visibility, reachability, and comfort in the assembly tasks of tightening the bolt (B), tightening the nut (N), and tightening the bolt and nut (BN) under the six difficulty levels

We analyzed the correlations between the assessment results made in experiment groups and the expertise using the Spearman correlation analysis and the Euclidean distance. Results of the Spearman correlation analysis show that the scores of visibility and reachability made in both MPI and 1PP conditions show significant correlations with the expert assessment scores (see Table 5). The Euclidean distance in MPI condition is less than that in 1PP condition for reachability assessment, but more than that in 1PP for visibility assessment. There is no obvious correlation between the scores made in experimental groups and the expertise scores for comfort assessment.

Table 5 Spearman correlation (ρ) and Euclidean distance (d) between the assembability assessment scores made in VR-aided system and by the experts

Subjective rating

Subjects’ rating on the interface usability was collected using the System Usability Scales (SUS) [20]. The statistical results of the SUS scores are shown in Fig. 10a. The mean and median of the SUS score of the 1PP group (mean = 75.8, median = 77.5) are all higher than those of the MPI group (mean = 73.3, median = 70).

Subjects’ ratings on their workload in different interfaces were collected using the NASA Task Load Index (TLX) scale [17]. Figure 10b shows the statistics results of the TLX scores. The results show that the mean and the median of the TLX scores in MPI (mean = 45.94, median = 44.83) are all higher in 1PP (mean = 44.17, median = 39.67).

Fig. 10
figure 10

Statistics results of the system usability scale and the TLX scale

Qualitative analysis

At the end of the experiment, we collected subjects’ feedback using several open-ended questions, including “What do you think of using the current system to assess the assemblability?” and “What are the advantages and the disadvantages of the current system when you use it to perform the task?” We list the reoccurring themes in the feedback as follows.

In the MPI group, subjects found that MPI is supportive for understanding the spatial relations. Subject 2 mentioned that “In MPI, I can easily find the target part when it is in a covert place.” Subject 3 admitted the advantages of MPI in supporting the reachability assessment. He suggested that “the assisted view can display the relative position between my body and the assembly parts, and I can see the spatial relations in the occluded space. These are helpful for judging the reachability.” Subject 1 complained that “the WIM lacks the scaling function which might be helpful for observing the details.”

In the 1PP group, subjects mainly complained about the deficiency in displaying the spatial information. Subject 2 complained that “I am unaware of where the collision happened.” Subject 3 mentioned that “it needs a long time to find the bolt and nut.” Subject 1 suggested that “it is better to indicate the assembly position from the beginning, and it is difficult to determine the reachability of tightening the bolt and nut simultaneously.”

5.5 Discussion

By analyzing the assemblability assessment results, we found that the assessment results of reachability seemed to be more accurate in MPI than in 1PP. This could be explained by that the higher spatial awareness provided by MPI allows users to understand the spatial relations more accurately. As reported in the open-ended comments, in MPI, users can obtain the relative position between their body and the assembly parts in the occluded spaces, leading to more accurate results of reachability assessment. But users are more likely to overestimate the visibility in MPI than in 1PP. We assume the possible reason could be that the rich spatial information provided by MPI may confuse the users as to whether they could see the assembly part from the 1PP view. We did not find obvious correlation between the comfort ratings made in the VR-aided system and the expertise scores. The possible reason could be that the subjective ratings on comfort have weak relations with the RULA scores.

The results of the subjective feedback show that users reported a higher sense of usability in 1PP than in MPI. We assume that the manipulation of adjusting MPI increases the complexity of the system and needs a long time for users to fully get used to. This resulted in users not being able to quickly grasp the use of MPI during the training phase of the experiment, thus reducing their subjective evaluation of usability. The results also reveal that users reported a higher sense of workload in MPI than in 1PP. This can be simply explained by the fact that users need to simultaneously process the information from both the primary view and the assisted view in MPI, which obviously increases users’ mental workload. However, it should be noticed that the measurement of the usability and workload may be influenced by the small number of the samples. In the future, we will conduct a formal study to further investigate the impact of MPI on system availability and user workload, taking into account the comments gained in this study.

6 Conclusion

In this paper, we have presented MPI, an interface for VR-aided assemblability assessment which integrates the 1PP and 3PP using a handheld WIM. We conducted two experiments to test the performance of the proposed interface. The first study investigated the characteristics of MPI in a general interaction task. Among the salient findings, we can say that the MPI is able to improve user’s spatial awareness by providing rich information about spatial relations and collisions with surrounding objects, and the impacts on user’s hand-manipulation performance are negligible. However, we noticed that the MPI cannot improve user’s performance of collision avoidance, of which the reason could be that the adjustment of the WIM model may cause additional collisions. The second study tested the performance of MPI in the assemblability assessment task. The MPI shows advantages in supporting reachability evaluation thanks to the better spatial awareness it provides to users than 1PP. However, the users are likely to overestimate the visibility when using MPI. The subjective feedback revealed lower usability scores and higher ratings on workload in MPI than in 1PP, but the results need further confirmation due to the small number of samples.

The main contributions of this paper are summarized as follows:

  • We improved the interface for VR-aided assemblability assessment using the proposed MPI, which enables users to simulate the assembly operations naturally in the virtual environment, while assessing the assemblability efficiently according to their own actions.

  • We conducted two studies to test the performance of the proposed interface. Results indicated that the proposed MPI can improve user’s spatial awareness and the ability of reachability assessment.

  • We demonstrated the application of the proposed interface in VR-aided assembly assessment in a prototype system.

In conclusion, the proposed MPI is believed to be an effective interface for VR-aided assemblability assessment. The novelty of MPI is that, to our knowledge, it provides for the first time a way to evaluate the assembly in VR with superior spatial awareness while naturally simulating assembly operations. We implemented the MPI using a handheld WIM which enables adjusting the assisted view in natural habits of human, but further improvements on MPI’s control are needed. Possible improvements could be reducing the controlling complexity of the MPI, such as the adaptive WIM locating and adjusting [34] to further relieve the interference of the WIM model control on bimanual manipulation. Another possible improvement could be providing collision cues in MPI to strengthen user’s collision awareness.