1 Introduction

Manual operations in a production substantially contribute to the overall costs of a product and should therefore be measured and evaluated carefully [6]. This becomes increasingly challenging as products change more rapidly and workplaces are decentralized. Thus, an evaluation on the machine level only (see Fig. 1) is not sufficient anymore. Recently, worker behavior cannot be assumed to be static, as it also includes locomotion, mainly driven by new production strategies such as Chacu-Chacu or decentralized production [18]. Thus, also cell level, shop floor level, and even factory level should be considered when evaluating manual work efficiency [1].

Figure 1 shows a typical structure of a manufacturing company. From top to bottom, the level of detail increases regarding manual work that can be analyzed. Often, the details in the description of manual work on the enterprise and factory level are not sufficient, which makes it difficult to conduct an optimization. Therefore, manual worker operations are mainly evaluated on machine level, at fixed workplaces, by applying so-called MTM (Methods-Time Measurement). MTM is one of the most widely used predetermined time measurement systems, which defines a particular time measured in so-called Time Measurement Units (TMUs) for each basic operation described in it [6]. One TMU is equivalent to 0.036 s, which allows for an in-depth analysis even for basic operations in the sub-second range. MTM has different standards, and each standard has its own set of basic operations and predetermined times.Footnote 1

Fig. 1
figure 1

Various investigation levels for manual work, adapted from [27]

The most comprehensive MTM standard is MTM-1 which consists of the following basic operations: reach, grasp, release, move, position, apply pressure, disengage, turn, crank, visual inspection, and body, leg, and foot movements. MTM-1 is not only complex but also precise due to the high level of detail. This is particularly favored in cases of mass production that should be analyzed with the highest possible level of detail. If such a detailed work description is not required, e.g., for a smaller production pipeline, the MTM-2 standard is used. MTM-2 is based on MTM-1 and consists of the following basic operations: get, put, apply pressure, regrasp, crank, eye action, foot motion, step, and bend and arise. MTM-2 is less precise than MTM-1, but it also allows conducting faster MTM analyses. For even smaller production entities, e.g., for batch production, the so-called MTM-MEK is used. It consists of the following basic operations: get and place, place, handle tool, operate, motion cycles, body motion, and visual control. And finally, to make MTM more universally applicable, the MTM-UAS standard was introduced. It has the same basic operations as MTM-MEK but with even less detailed information for each basic motion. A brief overview of the abovementioned MTM standards is presented in Fig. 2.

Fig. 2
figure 2

Application fields for different MTM standards based on the IMD Technical Platform

When analyzing manual operations with MTM, a video of a work process is recorded and manually transcribed. This process is time-consuming due to the need for the subsequent manual video transcription, which has to be conducted by an MTM expert. Furthermore, the MTM analysis is frequently done when the workplace is already built and can hardly be altered anymore. To overcome this issue, cardboard mock-ups are used in the planning phase, which represent the workplace to be investigated in full-size. Besides the time, cost, and generated waste, cardboard engineering often lacks details, which makes it difficult for workers to perform in the same way as they would do at a real workplace. Such details include raw materials and intermediates, which are usually not available during planning, to be processed or assembled by a worker.

However, novel opportunities arise due to assembly lines and production plants increasingly being planned and designed virtually. This is based on virtual planning data, which is usually available in the form of three-dimensional geometry (e.g., computer-aided design data), could be harnessed to generate a virtual environment (VE), wherein workers perform the manual operations to be analyzed virtually. This can be achieved through virtual reality (VR) technology, which immerses users within a VE, while allowing them to interact with virtual objects through the utilization of the controllers. Few prior works investigated the potentials of VR technology to support manual work measurement with a particular focus on MTM analysis. However, the setups in prior works are characterized by numerous limitations, such as not allowing for real walking, non-intuitive interaction, and the weightlessness of virtual objects [10, 11]. Consequently, the real work procedure cannot be fully replicated, and the results of work measurement in VR might be only of limited benefit [5].

Within this paper, we address this by developing a VE that emulates a typical workplace for manual assemblies, in which a worker can naturally walk, while completing a given task. The goal of the paper is to investigate work measurement in VR, including an MTM analysis and time study, compared to work measurement at the corresponding physical workplace.

To achieve this, we conduct a comparative study, wherein we analyze 21 study participants performing the same task in a VE and at a physical workplace using an MTM-2 approach and time study. We organize the remainder of this work as follows: Section 2 introduces the related work, Section 3 describes the task and the technical setup. The results are described in Section 4. Section 5 discusses the achieved results and, finally, Section 6 concludes the paper, and an outlook is given in Section 7.

2 Related work

The measurement of manual work in the manufacturing industry is associated with various benefits, including productivity gains and quality improvements [6]. With growing mass customization in an automated or semi-automated work environment, manual assembly tasks also grow in importance. Depending on a product’s configuration, manufacturing and assembly tasks can vary in time, and thus current research focuses on decentralized production.

In such decentralized production environments, so-called “slow” products do not block “faster” ones anymore, and thus the overall output of production can be kept stable or even increased, although multiple product variants are produced. However, decentralized production needs not only a careful planning but also a thorough investigation of manual assembly tasks, including walking and human-robot interaction. Moreover, there is no such support for planning manual work; and thus, MTM is still based on traditional cardboard engineering.

Since most of the 3D models for machines and tools already exist, it seems reasonable to employ VR technology to conduct MTM analyses. An early approach to digitally evaluate an ergonomic design was introduced by Schmidt and Wendt [25]. The software tool “COSIMAN” (Computer Simulation of Manual Assembly) was able to evaluate basic designs but was limited to more complex scenarios. Moreover, the handling of the software tool was too complex for industrial applications. Further applications of COSIMAN were also introduced by Kummelsteiner [16]. Another VR-supported MTM approach was introduced by Chan [7], who simulated a manual assembly line and retrieved MTM values by using virtual humans (avatars). However, the control of the avatars by a computer mouse, as well as missing functionalities for the presentation of products and process designs, imposed a substantial effort to achieve simulation results.

In spite of its drawbacks regarding operability, there exists a wide range of digital human model programs, as stated by Mühlstedt et al. [21]; they all offer an exocentric view on the manufacturing task. The main focus of these virtual human simulation programs is to analyze human postures and to determine the workplace, as described by Yang et al. [30]. Furthermore, these programs should assess the visibility and accessibility of an operation as stated by Chedmail et al. [8] or evaluate postures as stated by Bubb et al. [4]. In these virtual human programs, conventional MTM also can be integrated, as well as posture analysis techniques [13]. Based on inverse kinematics, the physical strain on each joint can be calculated for any given operational and external load [29].

Since motion capture systems became available, motion tracking methods for tracking an operator’s real movements to control the manikin became popular due to its realistic feeling and outstanding, lifelike realism. Already, in 2000, Chryssolouris et al. [9] proposed a “virtual assembly work cell,” which allows natural interaction with the VE to perform an assembly task. However, the overall system complexity and the particular limitations of the tracking system only allowed for a limited range of applications. Instead of a consecutive definition of the avatar’s movement, newer systems allow for a direct coupling of the avatar to the workers’ movements. For doing so, the user wears components of an optical tracking system, a data glove, and a data helmet.

In 2005, Jianping and Keyi [15] tracked the real operator’s movement in real time to control the virtual human “JACK” within a virtual maintenance system. Furthermore, a virtual assembly design environment was developed by Jayaram et al. [14], who also uses a virtual human that can be directly controlled by the worker’s movements. The motion data of the real operator was recorded and imported into JACK to perform an ergonomic analysis. Later, Wu et al. [28] used data gloves and a field-of-view tracker to capture the operator’s hand for a virtual assembly path planning in a VE. In 2012, Osterlund and Lawrence [23] used full-body optical motion tracking to control avatars in an astronaut training. However, their setup was dependent on a well-defined physical setup, much like in the abovementioned cardboard installation. According to various sources [19, 26], establishing a VE with complete physical attributes is still difficult and time-consuming, since such systems require a long setup time, in particular for optical, outside-in tracking systems. Following Seth et al. [26], also a simulation of realistic interaction using haptic devices is still difficult for the virtual prototyping community, while sensations, such as real walking to perceive distances, are completely missing. Hence, it is still inconvenient for a real operator to control a virtual human. Following Qiu et al. [24], current human motion capturing technology substantially limits the range of its applications.

With the spread of information technology, different approaches for analyzing workers’ performance and training new workers had appeared. Benter et al. [3] introduced an approach that used a 3D camera to capture data and analyzed working time with the MTM-1 method. The workplace consisted of three workstations, and the worker was assembling gearboxes for the duration of 20 min. Ma et al. [20] came up with a framework to evaluate manual work operations with the support of motion tracking. For these purposes, a marker-based optical tracking system with a total of 13 markers was used. The system consisted of a head-mounted display (HMD), a data glove, and eight cameras for body motion and hand-gesture motion recognition. To evaluate workers’ performance, a so-called Maynard operation sequence technique (MOST) was planned for use. There are three motion groups in the MOST system: general move, controlled move, and tool use. To validate the technical feasibility of the proposed framework, only two work tasks were taken: lifting an object and pushing a button. No walking was included. Another approach to automatically monitor the execution of human-based assembly operations was proposed by Andrianakos et al. [1]. Their approach is based on machine learning techniques and utilizes a Single Shot Detector algorithm for object detection. This algorithm detects objects of multiple categories and sizes in real time. For data collection, a single vision sensor (i.e., a webcam) was used. To evaluate this approach, a simple three-step assembly task was proposed. An important restriction of this method is that a task must be completed in a defined specific order. Müller et al. [22] designed a Smart-Assembly-Workplace (SAW) which was used in a bicycle e-hub assembly. SAW was designed to share knowledge about the assembly sequence with less qualified workers in an intuitive way. SAW consists of a combined working and learning environment. To define the learning sequence and time, MTM is used. For data retrieval, the motion-sensing device Microsoft Kinect is installed on top of the workplace, facing downwards, which allows the determination of the worker’s hand’s position without using markers. However, the reliability of Kinect depends on illumination conditions. It was reported that the hand tracking often fails to correctly locate the hands if illumination conditions vary. A tree-based approach to recognize MTM-UAS codes in VR was proposed by Bellarbi et al. [2]. It captures the tracking data of an HMD and controllers and divides this data into small sequences of movements. In this algorithm, all possible body motions belong to one of these three categories: eye movement, body movement, or hand movement. Each small sequence of movements from the captured data is compared to the data from the algorithm tree in order to get a corresponding to this sequence MTM-UAS code.

First approaches by Kunz et al. [17] showed that also real walking can be integrated in such virtual interpretations. However, a comparison to the real-world counterpart is still missing. However, the quality of MTM analyses and time study conducted in VR compared to the analysis of a corresponding real world task should be investigated [5, 11]. This paper will thus contribute to the topic of performing MTM analyses completely in VR by comparing the findings to a similar real-world workplace.

3 Methodology

In this section, we describe the methodology of this paper, including a user study, where each participant performed the same task in VR and reality. Furthermore, we outline the data retrieval and analysis procedure to conduct the intended comparison of work measurement in reality and VR.

3.1 Participants

The user study consists of 21 participants: 6 female and 15 male, recruited from the university’s student body. Since MTM is designed for a workplace evaluation by experienced workers, we had to design the task for the user study suitable for participants without any prior training. As the task was consciously chosen to be very simple, any biasing effect by learning is unlikely. Thus, each participant performed the same task twice, once at the real workplace and once in VR. To avoid any biasing from the task sequence, participants either started with the real task or with the virtual task in alternating order.

3.2 Technical setup

The technical setup consists of two identical workplaces in reality (see Fig. 4 left) and VR (see Fig. 4 right). Participants access the VR workplace with the HTC Vive Pro system. It uses so-called lighthouses to track users’ head position and orientation, as well as their hands holding the HTC Vive controllers. In our virtual setup, the participants are using only one controller to manipulate the objects. The controller has additional buttons and a touchpad to allow further interaction with the VE. Pulling the controller’s trigger, for instance, performs a grasping action to grab the virtual object with the virtual hand. Users see their virtual hand and the complete VE through the HMD, which also visually disconnects them from the real world. To reach all relevant objects in the VE, users are able to freely walk within a 5m × 5m tracking space. In both the real and the virtual scenery, the users’ movements are video-captured for a later manual transcription.

3.3 Task description

To compare the MTM analysis in reality and VR, a task that could be completed by inexperienced study participants was chosen. It is a simplified version of an industrial task, containing basic operations of the MTM-2 standard such as get, put, eye action, step, and bend and arise. In contradiction to MTM-UAS and MTM-MEK, which evaluate the walking time based on geometric distances only, MTM-2 allows counting steps of the user and gives a predetermined TMU value for each step. This allows evaluating walking behavior based on human behavior and not only on geometrically obtained values.

The top-down layout of the workplace is shown in Fig. 3. At the beginning of each study session, a pre-recorded video in VR with text instructions is shown to the participants. In this video, the task and the sequence of the process steps are introduced. The virtual workplace is designed to fit into the tracking space of 5m × 5m that is supported by the HTC Vive system. The participants start in front of a palette in 1.8 m distance to the larger table. They grab the box from the floor and put it on the bigger table. Afterward, they grab the screwdrivers one by one from the smaller table and put them into the box. Then they walk to the box lid, grab it, and close the box with the lid. The last step is to take the closed box, walk to the palette, and put this box on the palette. There was no time limit for this task. Participants are asked to complete the task at a natural speed. The real and virtual environments are shown in Fig. 4.

Fig. 3
figure 3

A top-down layout of the real and virtual workplace, enhanced with the 5 m × 5m VR tracking space

Fig. 4
figure 4

Comparison of real and virtual tasks during the user study. Left: the real workplace; right: the virtual counterpart

3.4 Data retrieval

Each study participant was recorded while performing the task in reality and in VR. The recording of the real task was conducted through a video camera, while “screen recordings” (i.e., a recording of a participant’s 1st person view) were created for the task in VR. Subsequently, each participant’s movements were manually transcribed according to the corresponding MTM-2 codes. The transcription was done by two people independently. Additionally, a time study was conducted for each user by measuring their task completion time in VR and reality.

4 Results

This section describes the findings of our comparative user study, including MTM-2 analysis and time study for the task conducted in reality and VR.

4.1 MTM-2 and time study in reality

Since MTM relies on statistically retrieved values (the “Measured Times”), this first data evaluation is to assess whether the participants of the study performed the given task at a normal working speed or not. For this, we compare the time study and MTM-2 transcription of the task conducted in reality by each participant. The results of this analysis are shown in Table 1 on the right and plotted in Fig. 5.

Table 1 Overall completion time for each user measured in VR and reality by direct observation time study and MTM-2
Fig. 5
figure 5

Comparison of the time study and the transcribed MTM-2 times in reality, shown in seconds

When analyzing the results of the time study, we see a mean value of mtimestudy,real = 27.667 s with a standard deviation of SDtimestudy,real = 4.4 s. This, in fact, shows that there are considerable differences between each individual study participant. However, when analyzing MTM-2 values of the transcription instead, we achieve mMTM − 2,real = 27.710 s and SDMTM − 2,real = 2.4 s. This is an expected outcome, since the MTM-2 analysis decouples workers’ movement from their individual speed in performing a task.

4.2 MTM-2 and time study in VR

Similar to the analysis in reality, we analyze both the MTM-2 and time study for each participant performing the task in VR, which is shown in Table 1 on the left and plotted in Fig. 6.

Fig. 6
figure 6

Comparison of time study and the transcribed MTM-2 times in VR, shown in seconds

When analyzing the time study from the VR task only, we see a mean value of mtimestudy,vr= 36.286 s with a standard deviation of SDtimestudy,vr= 7.9 s. This shows that the performance between participants substantially differs in VR. Seven users had an MTM-2 time that is larger than the time study, meaning that they performed the actions in VR faster than it is foreseen by the MTM-2 (see Table 1 on the left).

In comparison to the time study, the transcribed MTM-2 values of exactly the same work process are considerably smaller. The MTM-2 mean value for VR is mMTM − 2,vr = 28.991 s, while the standard deviation is SDMTM − 2,vr = 2.7 s. Similarly to the analysis in reality, this is an expected result since MTM-2 decouples the speed of a worker’s movement from the descriptive class of their movement.

4.3 Comparison of results in reality and VR

As it is one strength of MTM-2 to make workplaces comparable, we chose the overall MTM-2 values for the task conducted in reality and VR for the comparison. It is hypothesized that equal MTM-2 values will show that a VR workplace is capable of adequately representing a real workplace.

The overall information about the comparison of the virtual and real task completion time is shown in Table 2. Figure 7 shows that the MTM-2 values in reality and VR are very similar for each user, which confirms our previously stated hypothesis that the MTM-2 analysis for a task conducted in VR has comparable quality and can, in fact, replace a real one.

Table 2 Overall relations of the virtual completion time to the real for each user
Fig. 7
figure 7

Comparison of MTM-2 times in VR and reality, shown in seconds

However, it is also visible that the values are not absolutely coherent. To further clarify the resulting error, the quotient between the corresponding MTM values for an individual user is calculated, which is referred to as VR MTM-2 accuracy and should ideally be equal to 1. The deviation of the VR MTM-2 accuracy from 1 gives a measure of the quality of the MTM-2 analysis in VR. Figure 8 shows the deviation from this ideal value. For the VR MTM-2 accuracy, we achieve a mean of mV RMTM − 2accuracy = 1.05 and SDV RMTM − 2accuracy = 0.08.

Fig. 8
figure 8

Deviation of MTM-2 times in VR and reality

5 Discussion

While there is already a good correlation between the MTM-2 analysis in reality and VR, we made some observations during the course of the study, which could be an explanation for the typically longer task completion time when being exposed to the VE.

Missing haptic feedback for grasping objects

When grasping an object, humans rely on vision for coarsely approaching the object to be grasped, while the decision whether an object is grasped relies on haptic cues only. Such haptic feedback was missing in the VE, and thus users had to observe the object for a yellow wireframe that indicates that the virtual hand touches the virtual object. Although this is a common way in VR to indicate whether objects are touched or not, it imposes an additional cognitive effort, which was taken into account during the MTM-2 transcription as an “eye action” (E = 7 TMUs). However, we noticed that users significantly slow down their physical movement, in particular when trying to grasp the lid from the ground. This is probably due to the fact that users are afraid to collide the hand-held controller with the physical floor.

Missing haptic feedback for placing objects

When placing real objects on top of another, humans again rely on haptic feedback, indicating that the two physical objects collided with each other. This then indicates that the grasp can be released to place the object. Because of the VE lacking this feedback channel, we noticed a slowdown of the user’s movement in the VE in the following three situations: (i) placing the virtual box on the virtual table, (ii) placing the virtual lid on the virtual box, and (iii) placing the virtual box on the virtual palette.

Unfamiliar navigation means

We noticed that participants who never had any VR experience before did not walk naturally when navigating the first distance from the starting position to the box. They either made smaller steps, or generally moved slower without shortening the step size. However, we also noticed that users quickly familiarize with real walking in VR, so that already after this first very short path their walking speed was close to natural again.

6 Conclusion

In this paper, we followed the research question, how well an MTM-2 analysis and a time study could be conducted in VR compared to traditional analyses in reality. Conducting a study in VR brings the benefit of avoiding non-optimal workplace designs already in the planning phase without requiring the construction of physical mock-ups (cardboard engineering) for data acquisition.

Our comparative user study includes two identical workplaces in reality and VR and shows that it is possible to achieve comparable results in both setups using the MTM-2 evaluation. However, the overall completion time that is measured by direct observation time study differs substantially. This leads us to the conclusion that it is even necessary to analyze manual work procedures in VR by means of predetermined times, such as MTM-2, since the overall completion time measured by direct observation is higher in VR.

7 Outlook

Future work should supplement user studies by standardized questionnaires such as the NASA TLX [12] for measuring additional task loads that might be evoked by using a VR system. These studies could address a more complex work scenario at a given industrial workplace, employing professional workers to participate.

While the detection and transcription of basic motions was done manually for our study, the full potential of an MTM analysis in VR can be harnessed with an automated transcription. Four days were needed for the manual MTM-2 transcription of 21 users’ recordings, both in VR and reality, with two trials each. This potential time saving through automated transcription can be considered as a rough estimate for the actual savings potential. Therefore, our study can also be seen as a proof of concept for a fully automated MTM analysis, which was proposed by Bellarbi et al. [2].

This leads us to the next research question about the number of trackers and their position on a user’s body for capturing body motions. In our study, we track the head position together with one hand-held controller. However, it is likely that this will not be sufficient to identify all body motions and poses, e.g., bending. We agreed that this issue could be solved by attaching trackers to the user’s pelvis or feet. Further studies could focus on the optimal number of trackers and their placement or the most suitable VR hardware setup to capture and further analyze body motions.

Another issue for the automated transcription could be the separation of intended body motions that are required for completing the task from unintended body motions for handling the VR system itself. Unintended body motions could be caused by the uncomfortable attachment of the HMD or an interfering cable required by the VR system.

In order to move towards an even more detailed analysis such as MTM-1, we envision to also technically capture precise hand gesture detection. There are different ways to recognize gestures, but the most common ones used in VR are optical and inertial tracking. For optical tracking, systems like the Oculus Quest II could be used. Inertial tracking for hand gesture recognition usually consists of equipment with some mounted inertial trackers, e.g., Sensoryx gloves. It may be possible that using inertial gloves or optical tracking instead of controllers will be less accepted by the user since haptic feedback cannot be addressed yet, and thus grasping a virtual object may feel unnatural.