Keywords

1 Introduction

With the shift from mass production to mass customisation [1] in combination with an increased labour shortage deemed through an unfavoured demographic change [2], the competitiveness of tomorrow’s assembly industry is dictated by flexible and easy-to-program automation systems. A solution is promised by the concept of learning by demonstration, which endows a robotic system with the ability to be programmed through intuitive demonstration methods [3]. In recent years, task models based on trajectory-level approaches including Dynamic Movement Primitives (DMP) have prevailed in successfully reproducing assembly-related movements based on human demonstration [4].

As Dynamic Movement Primitives minimise the teaching time through one-shot learning and are capable of reproducing accurate trajectories with temporal and spatial scalability [5], key requirements are considered satisfied for the competitiveness in the industrial environment. However, handling complex tasks is still a major bottleneck of DMP and other trajectory-level task models [4].

In this work, the promising concept of embedding trajectory-level models within high-level symbolic task representations to tackle complex tasks is further investigated [3, 6]. Compared to other approaches in which often unsophisticated and limited frameworks were considered, the proposed optimised DMP framework utilises the industry-established Methods-Time Measurement (MTM) system which provides a comprehensive and elaborated structure for assembly tasks. Hence, the two fundamentally proven methods of DMP-based learning by demonstration and assembly task analysis according to MTM are combined to create a solution to the situation outlined above.

The remainder of the paper is organised as follows. Section 2 outlines the background and state-of-the-art for Dynamic Movement Primitives and Methods-Time Measurement. Our conceptual framework towards an industry-oriented MTM-1 based optimised DMP framework is depicted in Sect. 3. Section 4 provides an experimental validation of the framework on a generic pick and place operation, followed by a conclusion and discussion of future work in Sect. 5.

2 Background

This section summarises the theoretical background of Dynamic Movement Primitives and Methods-Time Measurement and the state-of-the-art relevant to the proposed framework.

2.1 Dynamic Movement Primitives

Dynamic Movement Primitives were initially introduced by Schaal et al. [7] in 2003. The revised formulation by Saveriano et al. [8] is considered the state-of-the-art for Cartesian space Dynamic Movement Primitives (CDMP) and was used for the proposed framework. Here, the task space is divided into two transformation systems formed as second-order dynamical systems to capture the translational (1) and rotational dimensions (2).

$$\begin{aligned} \begin{array}{rcl} \tau \ \dot{{\textbf {v}}}\ {} &{}=&{} {\textbf {K}}^p\ [\ ( {\textbf {p}}_g\ - {\textbf {p}} )\ -\ ( {\textbf {p}}_g\ -\ {\textbf {p}}_0 )\ s\ +\ {\textbf {f}}^{p} (s)\ ]\ -\ {\textbf {D}}^{p}\ {\textbf {v}}\ +\ \boldsymbol{\chi }^p \\ \tau \ \dot{{\textbf {p}}}\ {} &{}=&{} {\textbf {v}} \end{array} \end{aligned}$$
(1)
$$\begin{aligned} \begin{array}{rcl} \tau \ \dot{\boldsymbol{\omega }}\ {} &{}=&{} {\textbf {K}}^q\ [\ {\textbf {e}}_0 ({\textbf {q}}_g,{\textbf {q}})\ -\ {\textbf {e}}_0 ({\textbf {q}}_g,{\textbf {q}}_0)\ s\ +\ {\textbf {f}}^{q} (s)\ ]\ -\ {\textbf {D}}^{q}\ \boldsymbol{\omega } \ +\ \boldsymbol{\chi }^{q} \\ \tau \ \dot{{\textbf {q}}}\ {} &{}=&{} \frac{1}{2}\ {[\ 0,\ \boldsymbol{\omega }^T\ ]}^T\ \times \ {\textbf {q}} \end{array} \end{aligned}$$
(2)

The position, linear velocity, and acceleration are symbolised as \({\textbf {p}}\), \({\textbf {v}}\), \(\dot{{\textbf {v}}} \in {R}^{3}\). \({\textbf {q}} \in SO(3)\) represents a unit quaternion with \(\boldsymbol{\omega }\), \(\dot{\boldsymbol{\omega }} \in {R}^{3}\) being the angular velocity and acceleration, and \({\textbf {e}}_0 ({\textbf {q}}_i,{\textbf {q}}_j)\) is defined as the oientation error between \({\textbf {q}}_i\) and \({\textbf {q}}_j\). The parameter \(\tau \) facilitates the temporal scaling and the scalar s creates the time independency through the canonical system. The constants \({\textbf {p}}_0\), \({\textbf {q}}_0\) and \({\textbf {p}}_g\), \({\textbf {q}}_g\) stand for the start and goal poses, respectively. The positive definite matrices \({\textbf {K}}^i\), \({\textbf {D}}^i\) are stiffness and damping gains. The forcing terms \({\textbf {f}}^i (s)\) preserve the non-linear behaviour of a demonstrated trajectory through weighted Radial Basis Functions \(\text {w}_n \psi _n(s)\) (RBF). The term \({\boldsymbol{\chi }}^i\) represents any extension to the dynamical system. For an in-depth explanation of DMPs see [5].

In preliminary works on DMPs by Schaal et al. [6] in 1999, a compact state-action-state sequence is shown to be a natural prerequisite for task imitation with movement primitives expressing states as aligned, in contact, near-to, and actions as move-to, grasp-object, move-above, etc. Such a combination of low- and high-level task representation is still promoted for handling compounded actions  [3]. Following the assumption that most human hand movements can be segmented into reach, manipulation and withdraw phases, Mao et al. [9] reproduced a chopping task by identifying grasp/release transitions and key manipulation points. Aein et al. [10] developed a three-level task model architecture based on an action-grammar analogy. The low-level controller possessed arm movement primitives for position and force control and hand primitives for open, close, grasp, and ungrasp. Eiband et al. [11] defined four robot skills, including gripper open, gripper close, free movement, and haptic exploration, to establish a tree that describes geometric relationships between consecutive skills. Complex dual-arm household tasks were investigated by Caccavale et al. [12], resulting in a low-level segmentation based on object proximity (near/far) and explicit human commands (open/close gripper) combined with a high-level attentional behaviour-based system to structure identified movement primitives.

While reasonable symbolic frameworks have been explored, none is based on a sophisticated industry-proven structure, limiting their probability to endure realistic industrial assembly operations.

2.2 Methods-Time Measurement

Introduced in 1948 by H. Maynard et al. [13], Methods-Time Measurement ranks among the most established predetermined motion time systems in today’s industrial market. The MTM-1 variant, designed to analyse short-cycle repetitions, is provably capable of segmenting most manual assembly-related operations and methods. The proposed framework is build on five of its basic elements, namely reach, grasp, move, position, and release. Their definition is provided in Table 1.

Table 1 Properties of MTM-1 basic elements after [13]

Besides the intended use for designing workplaces and work methods, MTM proves to be valuable for the field of robot science. Drumwright et al. [14] developed primitive actions for task-level programming of humanoid robots based on the MTM-1 basic elements. With the growing interest in establishing human-robot interaction, the MTM-1 framework was assessed for the analysis of robot incorporated workspaces [15]. Finally, recent research has explored how to automate the classification of handling tasks according to MTM-1 using machine learning techniques [16]. The latter promises to greatly simplify its applicability in the proposed learning by demonstration context.

3 The MTM-based Optimised CDMP Framework

The proposed framework for tackling complex assembly tasks embeds the tra- jectory-level CDMP model within the industry-established MTM-1 system as the high-level task representation. Compared to other approaches, it establishes the benefits of a comprehensive and proven structure for industrial assembly tasks and considers distinctive properties from individual subskills. In this Section, customised CDMP models are designed to reflect the differentiating properties of the five basic elements of the MTM-1 system. The MTM-based optimised CDMP framework is summarised in Fig. 1 and explained in detail below.

Fig. 1
figure 1

The MTM-optimised CDMP framework (Remark: The specific parameterisation is subject to the robot’s capabilities and the application’s requirements)

REACH—The sequence of subskills commences typically with reaching towards a workpiece, where time efficiency and movement generalisation are essential. Since the covered distance primarily dictates the time efficiency during the reach subskill, the temporal scaling property of CDMP models becomes valuable, especially when the task was demonstrated under reduced speed. It is achieved by amending the time constant \(\tau \), resulting in an effortless adjustment of the robot’s end-effector velocity during reproduction. Since the accuracy is considered less important when approaching the workpiece, the number of RBF is recommended to be chosen low. By doing so, a smoother trajectory is created, removing shaky discrepancies, and the computational costs are reduced. Considering human demonstrations being often non-optimal for the robots kinematic, the weights \(\text {w}_n\) may be further optimised using reinforcement learning [5].

Besides the temporal scaling property of CDMP models, the spatial scaling option creates additional advantageous characteristics for this subskill. While CDMP models can inherently cope with deviating starting poses, the goal pose \({\textbf {p}}_g\), \({\textbf {q}}_g\) is also adjustable in real-time through a goal switching mechanism as described in [5]. An object recognition method may be applied to detect different workpieces and identify a quantifiable goal pose for the reach CDMP model. Finally, the CDMP model of the reach subskill generalises further by adjusting its trajectory in case obstacles appear on its path. This can be realised through an CDMP extension for volumetric object avoidance which was explored in [17].

GRASP—After reaching the target position close to the workpiece, the grasp subskill commences. In contrary to the reach subskill, a much shorter distance is to be bridged. However, it does require a higher accuracy as a distinguishing characteristic, which dictates the success of the grasp operation.

Based on this requirement, the number of RBF replicating the demonstrated grasp subskill is recommended to be chosen high. To reduce the risk of damaging inertia forces or control limitations, a similar or slower reproduction speed than the demonstrated scenario is desirable and realised by increasing the time constant \(\tau \) in the CDMP model. As far as the hardware setup permits it, additional visual or force feedback may be considered to improve the accuracy further. Finally, the gripper actuation may be reproduced through a simple DMP model under the same canonical system to guarantee correct actuation timing.

MOVE—Once grasped and lifted sufficiently to allow free movement, the move subskill is initiated to transport the workpiece close to its destination. As this subskill also focuses on a large motion in which the accuracy is considered less relevant, the same efficiency and generalisation ideas as in the reach element apply. Nevertheless, the properties of the transported workpiece have to be considered. This includes its weight, dimensions, and fragility.

In accordance with the requirements, the time constant \(\tau \) is adjusted appropriately but may be increased to improve time efficiency. A lower number of RBF to reproduce the demonstration trajectory allows smoothing out shaky demonstration motions and reduces computational costs. When considering optimising the weights of the forcing terms \({\textbf {f}}^i(s)\) through reinforcement learning, as discussed for the reach subskill, the workpiece dimensions must be included.

Regarding generalisation capabilities, the starting pose is provided by the end pose of the preceding position CDMP model outcome. End pose adjustments may be incorporated in real-time as discussed for the reach subskill. Similar to the reinforcement learning augmentation, the workpiece dimensions must be considered when applying object avoidance methods.

POSITION—The position subskill describes the most challenging aspect of an assembly task. It covers aligning, orienting, and engaging the grasped workpiece with its designated location relative to another object. Similar to the grasp subskill, accuracy is a vital factor for the success of this subskill. However, a fundamental characteristic during positioning is the occurrence of contact forces and torques which can significantly influence the appropriate execution. In order to improve the accuracy, a high RBF density is recommended to replicate the demonstrated motion. Since accurate execution is of more importance than its speed, a suitably low time constant \(\tau \) may be selected.

Beyond the achievable positional accuracy, the consideration of contact forces and torques promises to enhance the robustness of the position subskill. Therefore, these should be incorporated in the CDMP model, which can be realised in different ways [5].

RELEASE—The position subskill terminates when the workpiece is successfully aligned and oriented, and no interfering forces are recorded. Once this state is reached, the release subskill commences by actuating the gripper and ends after a collision-free disengagement from the workpiece. Like the preceding position subskill, a continued high accuracy and reduced reproduction speed characterise the release CDMP model. An assessment of noticeable forces may be used to guarantee no intervention with the workpiece during disengagement.

4 Experimental Evaluation

The proposed MTM-1 based optimised CDMP framework was evaluated on a generic pick and place experiment. Here, a toy dice (\(8\ \text {cm} \times 8\ \text {cm} \times 8\ \text {cm}\)) is to be picked up from its initial location and to be placed onto a stationary assembly jig with an \(9\ \text {cm}\ \times 9\ \text {cm}\ \times 1\ \text {cm}\) recess (see Fig. 2). Based on a human demonstration via kinesthetic teaching, the task was reproduced using the MTM-based optimised CDMP framework and then compared with two one-model-fits-all CDMP models with distinguished accuracy levels.

Fig. 2
figure 2

Experimental setup

4.1 Experimental Setup

The experiment was conducted on an UR5e robot from Universal Robots with OnRobot RG6 gripper. An ATI Axia80 F/T sensor installed in the assembly jig measured the wrench during positioning (see Fig. 2). The free drive mode of the UR5e was used for demonstration. End-effector cartesian poses were recorded 100 Hz while the gripper was actuated manually using the teach pendant. During the transportation of the workpiece, an artificial disturbance was introduced by shaking the end-effector for a short time. The desired transitions between the five subskills were communicated from the human teacher by briefly pausing the movement.

For the MTM-based CDMP framework, the demonstration data was separated into the subskills and fed to individual CDMP models as described in Sect. 3. In accordance to the proposed framework, the reach and move CDMP models were simplified with 10 RBF and doubled in speed by halving the time constant \(\tau \). In contrary, the grasp, position, and release CDMP models were generated with 200 RBF to improve their accuracy and the same time constant \(\tau \) as during demonstration. All other CDMP parameters were kept the same across subskills, including \(K^i\) as 100, \(D^i\) being critically damped, the canonical system’s parameter \(\alpha _s = - ln(0.001) * T\), and RBF centres equally distributed in time with a width of 2. As the subskill transitions occurred without velocities, the final merging of individual CDMP sequence was realised by the suggested approach of Saveriano et al. [8], with the initial poses being the end poses of the prior CDMP subskill. The generalisation of the starting pose was examined by introducing an offset of \(+3\ \text {cm}\) in each translational dimension to the starting position of the demonstration data. For comparison, the one-model-fits-all CDMP approach was used twice with 10 and 200 RBF per subskill, no temporal scaling, and all other CDMP parameters being equivalent. During reproduction, the gripper was actuated manually by the human operator. The offline processing of the demonstration data and CDMP calculation was conducted in MATLAB (the code is accessible at https://github.com/VictorHerMor/2022-mtm-based-dynamic-movement-primitives-mhi).

4.2 Results and Discussion

Figure 3 shows the translational dimensions of the proposed MTM-based optimised CDMP framework compared to the one-model-fits-all CDMP approaches, from which four essential differences are observed. The reach and move subskill duration are indicated in green and blue highlighted areas within the y-dimension graph. Its comparison to the demonstration data shows that the desired end pose was reached after half the time of the respective subskill, reducing the whole reproduction duration by approximately 10 s. The introduced \(3\ cm\) offsets in each translational dimension were eliminated during the reach subskill, demonstrating the capability of coping with distinguished starting positions (green circles). The artificially introduced disturbance during the move subskill (around 30 s, blue circles) was smoothed out in the MTM-based optimised CDMP framework, while the required accuracy during the grasp, position and release subskill were maintained. The discrepancy of the latter feature to the one-model-fits-all CDMP approach with 10 RBF per subskill is highlighted with red circles, where critical dips in the z-dimension appear. Furthermore, while a high accurate one-model-fits-all CDMP approach (200 RBF per subskill) matches accurately the demonstration data, including the artificially introduced disturbance during the move subskill. However, its computational costs are \(36\ \%\) higher than for a one-model-fits-all CDMP alternative with only 10 RBF per subskill. In comparison, the MTM-based optimised CDMP framework increases the computational costs by only \(6\ \%\).

Fig. 3
figure 3

Translation during demonstration (), one-model-fits-all CDMP approach with 10 RBF per subskill (), and MTM-based optimised CDMP framework ()

Figure 4 shows the measured forces in the z-direction during the position subskill. Based on a post-assessment of the force profile, the data verifies that the dice was successfully placed on the assembly jig through an identical end value. Furthermore, the occurred forces during reproduction did not exceed those during the demonstration, suggesting a damage-free task replication.

Fig. 4
figure 4

ATI Axia80 z-force measurement during the position subskill ( human demonstration, MTM-based optimised CDMP framework)

In summary, the distinction between the five MTM-1 basic elements and the design of characteristic CDMP models bring the decisive benefit of focusing on their unique requirements, paving the way to tackling compounded and complex assembly operations.

5 Conclusion and Future Work

While Dynamic Movement Primitives are considered a promising approach for robotic learning by demonstration, their stand-alone application lacks handling complex assembly tasks. This paper has presented a method to address this limitation by distinguishing subskills on a symbolic level provided by the industrially well-established MTM-1 framework. By doing so, five unique CDMP models were defined, which are designed to match the individual characteristics of the MTM-1 basic elements reach, grasp, move, position, and release. The proposed method was evaluated on a pick and place assembly task, showing more decisive benefits than the one-model-fits-all CDMP approach. These include appropriate time management, matching accuracy in relevant periods of the assembly task and force monitoring at adequate times.

With the presented experimental results demonstrating its proof-of-concept, the framework’s optimisation shows potential for further analysis. While the proposed approach relies currently on the author’s expertise to parameterise the CDMP models, a sophisticated mathematical analysis regarding the design decisions and their implementations will provide more robustness to the system. On the other side, the proposed method’s full potential is yet to be explored, including its analogy to human efficiency with the predetermined motion-time and further abstraction through elaborated subsequent MTM variants. Finally, the transferability and generalisation to other robot systems and applications will be exploited in future work.