1 Introduction

The current state of capabilities for robot manipulation in domains such as in-home assistance, search and rescue, and natural or military disasters lags behind human capability and speed. One particular capability at which humans (and other animals) excel is using haptic feedback to operate effectively in clutter. Robots with this capability could potentially perform tasks better in constrained and dynamic scenarios such as reaching into containers or cupboards without line-of-sight, performing search and rescue in debris, or working alongside human co-workers.

In this paper, we specifically consider the problem of a robot arm (i.e., a serial manipulator) reaching into clutter in order to move its end effector to a target position. We define success to be when all contact forces that occur are low and the end effector attains a position close to the target. Success does not depend on the orientation of the end effector. In contrast to our previous research, we focus on enabling the robot to reach the target position in a short amount of time. As such, dynamic phenomena, such as inertia and impact forces, play an important role.

Model predictive control (MPC) offers a promising approach to handling the multiple objectives and constraints associated with reaching in clutter. As we have previously discussed in Jain et al. (2013), while reaching in dense clutter, a robot is likely to make contact between its arm and the environment at multiple locations. Moreover, the robot is unlikely to anticipate each contact event before it occurs. In Jain et al. (2013), we presented a single-time-step model predictive controller that used a quasistatic mechanical model of the robot in contact with the environment. This controller, which we refer to as quasistatic MPC, achieved good empirical performance across a variety of circumstances in terms of enabling robots to successfully reach target locations while keeping contact forces low. Quasistatic MPC assumed that the robot had whole-arm tactile sensing and compliant joints. Our results also provided evidence that, when reaching in clutter, whole-arm tactile sensing and joint-torque sensing together enable superior performance compared to joint-torque sensing alone or joint-torque sensing with a force–torque sensor on each link for contact force sensing.

However, the quasistatic model used by quasistatic MPC did not account for dynamic properties, such as link inertia and joint damping. As expected, the robot performed best when moving at low velocities for which the dynamics become negligible. Many tasks would benefit from faster robots, but effectively controlling forces from multiple intermittent and unexpected contacts presents a substantial challenge. Faster end effector velocities tend to result in both higher impact forces and higher forces from persistent contact. Even intentionally slow motion with compliant joints can result in dynamic phenomena, such as when an arm with a preloaded compliant joint slips off of an object, allowing the joint to release its stored energy and thereby accelerate the robot links.

In this paper, we present a controller that enables a robot manipulator to rapidly move its end effector and simultaneously control contact forces in the presence of multiple contacts along the entire robot arm. We present a multi-time-step model predictive controller that models dynamic phenomena relevant to the task of reaching in clutter. This controller, which we refer to as dynamic MPC, models the dynamics of the arm in contact with the environment in order to predict how contact forces will change and how the robot’s end effector will move. Among other factors, the model considers link inertia and damping at the robot’s joints. Dynamic MPC also models how joint velocities will influence potential impact forces based on a collision model. This enables the robot to select joint velocities that mitigate potential impact forces. At each time step, dynamic MPC uses linear models that locally approximate the nonlinear dynamics in order to generate a convex optimization problem that it can solve efficiently.

As with our previous work, we assume the following:

  • Contact that results in forces below a user defined threshold is acceptable.

  • The robot has some form of compliance at its joints (either active or passive).

  • Tactile sensors cover the entire surface of the robot arm.

We evaluate our controller empirically. Through tens of thousands of trials in simulation, we show that with dynamic MPC a simulated robot can, on average, reach goals 1.4 to 2 times faster than the required time for our previous controller, while attaining comparable success rates and fewer occurrences of high forces. As expected, dynamic MPC performed better in low-density clutter, where it could use the open space to accelerate. Likewise, it performed better when the controller allowed the robot to apply higher forces (25 N instead of 5 N) to the world, which increased the chance that the robot could slip off of one object and hit another. Interestingly, dynamic MPC also performed better in higher clutter with only low forces allowed, which is a situation that should be well-matched to quasistatic MPC.

We also conducted extensive trials with a real 7 degree-of-freedom (DoF) humanoid robot arm. Throughout our tests, dynamic MPC enabled the robot to rapidly reach locations in dense clutter while keeping contact forces low (see Fig. 1).

Fig. 1
figure 1

Front view of the robot Darci reaching into a cluttered environment

We have organized the paper as follows: Sect. 2 discusses related prior research. Section 3 covers the mathematical formulas and assumptions for formulating our controller. Section 4 describes our software, hardware and experimental setup. In Sect. 5, we show evidence for the accuracy of our simplified dynamic model over a short, but useful, time horizon. In Sect. 6 we present and discuss results from comparing quasistatic MPC and dynamic MPC in our software simulation testbed. We present results from a global reaching task with a real robot in Sect. 7. We present results from local tests that emphasize aspects of contact force control on a real robot with ground-truth force–torque data in Sect. 8. We use the term global reaching task to refer to a trial for which the robot’s end effector reaching a target location is an important measure of success. We use the term local test to refer to a trial that involves the robot maneuvering for a short period of time over a short distance without consideration of whether or not the robot’s end effector attains a target position. We conclude with a discussion of applications and an overview of our results in Sects. 9 and 10.

2 Related work

We first presented our dynamic controller in a conference paper published at Humanoids 2013 (Killpack and Kemp 2013). The conference paper only conveyed results from two simulated three-link planar robots. In this journal article, we present extensive results from dynamic MPC running on a real 7 DoF humanoid robot arm. In addition, we provide a more thorough formulation of the controller, details about how we altered the controller to run on the real robot, and an empirical evaluation of the fidelity of the controller’s approximate linear models.

Many common approaches to robotic manipulation are poorly matched to the challenges of reaching in dense clutter. Methods often rely on collision-free arm motion, line-of-sight sensing of the volume to be traversed, or detailed geometric models prior to reaching (Dogar and Srinivasa 2011; Hornung et al. 2012; Kavraki and laValle 2008; Leeper et al. 2013a, b; Saxena et al. 2008, 2011; Srinivasa et al. 2009; Stilman et al. 2007). Most robotic manipulation research has emphasized avoiding contact except at the end effector (Guo and Hsia 1993; Katz et al. 2013; Latombe 1990; Stilman 2010) or other single point contact locations (De Luca et al. 2006; De Luca and Ferrajoli 2008), even as the robot attempts to operate in unstructured environments. Such restrictions on contact unnecessarily limit a robot’s actions when low-force contact is benign and allowable for a given task. Our approach allows multiple contacts along the entire arm surface and uses only haptic feedback to reach through unknown environments.

Work in Erez and Todorov (2012) shows that contact along the arm may be permissible when performing a task, but requires detailed geometric models of the environment, assumes rigid contact, and does not use sensor feedback during the tasks. While results in Mordatch et al. (2012), and Mordatch et al. (2012) use an optimal control formulation and explicit contact modeling to perform multi-contact tasks, but produce open-loop trajectories and do not use online haptic feedback.

The majority of robotics research on unwanted or unmodeled collisions has focused on reacting to impacts after collision has occurred (De Luca et al. 2006; De Luca and Mattone 2004; Haddadin et al. 2008). This includes work that quantifies forces during impact (Phan et al. 2011) using novel sensing technology and work that models the instantaneous stiffness effects during collision (Shin et al. 2011). Extensive research has aimed to quantify the potential for personal injury from robot-human collisions (Haddadin et al. 2011) and some work has been done to limit robot joint velocities accordingly (Haddadin et al. 2012). In our work, we use an impact-momentum model in our cost function to regulate joint velocities to mitigate contact forces from unexpected impacts without limiting all joint velocities uniformly.

In regards to our approach for control, one of the earliest application areas for MPC was chemical process control (Garcia et al. 1989). MPC is also often referred to as receding horizon control and has been used in work on the control of aerial vehicles (Abbeel et al. 2010; Bellingham et al. 2002). MPC has also been used in robot locomotion research (e.g, Erez et al. 2012; Manchester et al. 2011; Wieber 2006). In terms of robot manipulation, MPC has recently been used in applications such as bouncing a ball (Kulchenko and Todorov 2011), generating manipulator trajectories to compensate for inertial forces on a boat (From et al. 2011), controlling a 6 DoF cable-driven parallel manipulator (Duchaine et al. 2007), and reaching in free space (Ivaldi et al. 2010).

3 Model predictive control formulation

In this section, we first explain the architecture of our low-level control framework. We then present the mathematical models our controller uses for predicting the motion and contact forces for our robot. For the prediction step of the controller, we use a forward, discrete-time prediction model. We subsequently show the form of our model predictive controller. Details about our previous work with quasistatic MPC, against which we compare performance with dynamic MPC, can be found in Jain et al. (2013).

The list of symbols at the beginning of the paper summarizes the nomenclature we will use. Lower case variables that are bold face are vectors, while upper case variables that are bold face are matrices, and any non-bold face variable is a scalar.

3.1 Overall controller structure

In our system, MPC runs on top of simple joint impedance control (see Hogan 1985; Hogan and Buerger 2005). We use simple joint impedance control due to its compliance and stability when in contact. Equation 1 defines the joint torque control vector, \({\varvec{\tau }}_{control}\), given a joint space stiffness matrix \({\varvec{K}}_{p}\), damping matrix \({\varvec{K}}_{d}\), equilibrium angles \({\varvec{q}}_{eq}\), joint angles \({\varvec{q}}\), and joint velocities \({\varvec{\dot{q}}}\). \({\varvec{\hat{G}}}({\varvec{q}})\) provides a gravity-compensating torque.

$$\begin{aligned} {\varvec{\tau }}_{control}\left( {\varvec{K}}_{p}, {\varvec{K}}_{d}, {\varvec{q}}_{eq}\right) = {\varvec{K}}_{p}\left( {\varvec{q}}_{eq}-{\varvec{q}}\right) - {\varvec{K}}_{d}{\varvec{\dot{q}}}+ {\varvec{\hat{G}}}\left( {\varvec{q}}\right) \end{aligned}$$
(1)

For our work, \({\varvec{K}}_{p}\) and \({\varvec{K}}_{d}\) are nonsingular diagonal matrices, so the robot can be thought of as moving its arm by changing the equilibrium angles, \({\varvec{q}}_{eq}\), of viscoelastic torsional springs located at its joints. We use compliant springs that are virtual, but they could potentially be implemented as real physical springs. In the absence of disturbances such as gravity or other externally applied forces, the robot’s arm will eventually settle such that its joint angles equal the commanded equilibrium angles, \({\varvec{q}}={\varvec{q}}_{eq}\). Hogan et al refer to a set of equilibrium angles over time as a virtual trajectory.

While our implementation is mathematically analogous to proportional derivative control (PD control), the semantics are different. For example, the equilibrium angles, \({\varvec{q}}_{eq}\), are not desired joint angles, and a difference between the equilibrium angles and the current joint angles, \({\varvec{q}}\), is not necessarily error. Likewise, \(K_{p}\) and \(K_{d}\) are more aptly thought of as defining a desirable mechanical impedance for the joints (e.g., high compliance for our system) rather than as proportional and derivative gains to reduce errors over time.

MPC runs on top of this simple joint impedance controller commanding the equilibrium angles (\({\varvec{q}}_{eq}\)) based on the measured external forces, joint angles, joint velocities, current end effector position, and the desired end effector position (see Fig. 2). The model predictive controller uses a model that explicitly incorporates a model of the underlying simple joint impedance controller. In our implementation, the higher bandwidth simple joint impedance controller defined in Eq. 1 and shown in Fig. 2 runs at 1 kHz. MPC runs in an outer control loop at approximately 20–100 Hz (depending on the robot platform). Between updates of the equilibrium angles, \({\varvec{q}}_{eq}\), the simple joint impedance controller holds them constant.

Fig. 2
figure 2

This is the block diagram of major components for dynamic MPC

3.2 Robot equations of motion

In order to use MPC, we need models to predict the change in the robot’s state given control inputs and disturbances. In the following sections, we derive nonlinear contact and robot state models for a serial manipulator in contact with its environment. We then present linear contact and robot state models that approximate these nonlinear models and which can be used for dynamic MPC.

We start with the joint space Lagrangian formulation for a serial torque controlled robot manipulator found in robotics textbooks such as Craig (2005) and Siciliano et al. (2011):

$$\begin{aligned} {\varvec{M}}({\varvec{q}}){\varvec{\ddot{q}}} + {\varvec{C}}({\varvec{q}},{\varvec{\dot{q}}}){\varvec{\dot{q}}}\, + \, {\varvec{F}}({\varvec{\dot{q}}})+{\varvec{G}}({\varvec{q}}) = {\varvec{\tau }}_{ext}+{\varvec{\tau }}_{control} \end{aligned}$$
(2)

The variable \({\varvec{\ddot{q}}}\) represents the joint accelerations and all terms in Eq. 2 are in joint space. On the left hand side of the equation, \({\varvec{M}}({\varvec{q}})\) is the configuration dependent mass matrix, \({\varvec{C}}({\varvec{q}},{\varvec{\dot{q}}})\) represents the Coriolis and centrifugal terms, \({\varvec{G}}({\varvec{q}})\) is the configuration dependent gravity term, and \({\varvec{F}}({\varvec{\dot{q}}})\) is the resultant joint torque vector due to both viscous and Coulomb friction. The terms on the right hand side of Eq. 2 represent the control torques, \(\tau _{control}\), and the external torques \({\varvec{\tau }}_{ext}\) due to external forces applied by the environment. These external torques can be defined as follows if we assume that all contact forces are point contacts:

$$\begin{aligned} {\varvec{\tau }}_{ext}=\sum \limits _{i=1}^{N}{\varvec{J}}_{c_i}^{T} ({\varvec{q}}){\varvec{f}}_{i}^{ext} \end{aligned}$$
(3)

In Eq. 3, N is the number of total contacts at the current instant in time. \({\varvec{f}}_{i}^{ext}\) is the current contact force at the ith contact and \({\varvec{J}}_{c_i}({\varvec{q}})\) is the configuration dependent geometric contact Jacobian at that contact location.

Figure 3, illustrates our dynamic model. The model incorporates link inertia, as well as stiffness and damping at the robot’s joints, and contact as bi-linear springs. The springs and dampers at the joints model the underlying joint-space impedance control.

Fig. 3
figure 3

Graphical representation of the dynamics that are included in our robot model. The links have mass and rotational inertia. The blue elements represent the simple joint impedance control and the red springs represent contact with the world (Color figure online)

3.2.1 Contact model

Our system models each contact as a linear spring that only applies force normal to the surface of the robot’s arm. Any forces tangent to the surface of the arm are ignored. At each time step, the system creates a new mechanical model with a linear spring at each location on the arm at which contact has been detected. In our implementation, the robot detects contact when a measured contact force exceeds a threshold.

The model predicts that the force, \({\varvec{f}}_{i}^{ext}\), applied to the robot’s arm at contact location i will change by \(\varDelta {\varvec{f}}_{i}^{ext}\) when the associated location on the arm moves by \(\varDelta {\varvec{x}}_{c_i}\), where

$$\begin{aligned} \varDelta {\varvec{f}}_{i}^{ext} = -{\varvec{K}}_{c_i} \varDelta {\varvec{x}}_{c_i} \end{aligned}$$
(4)

\({\varvec{K}}_{c_i}\) is a positive semidefinite Cartesian stiffness matrix, which has the form \({\varvec{K}}_{c_i}={\varvec{\hat{n}}}_{c_i} k_{c_i} {\varvec{\hat{n}}}_{c_i}^T\). The variable \({\varvec{\hat{n}}}_{c_i}\) is a unit vector normal to the robot’s arm at contact i, and \(k_{c_i}\) is a positive scalar representing the stiffness associated with motions parallel to this surface normal. Motion in directions orthogonal to the surface normal have zero stiffness and hence result in no predicted change in contact force. Notably, this contact model predicts adhesive forces when the robot moves away from a contact location. As such, unilateral contact models present an interesting direction for future work (Erez et al. 2012; Posa and Tedrake 2013; Stewart and Trinkle 2000).

Our model approximates the translation of contact location i as

$$\begin{aligned} \varDelta {\varvec{x}}_{c_i} \approx {\varvec{J}}_{c_i}({\varvec{q_0}})({\varvec{q}} - {\varvec{q_0}}) \end{aligned}$$
(5)

where \({\varvec{q_0}}\) is the initial joint configuration, \({\varvec{q_0}}={\varvec{q}}(t_0)\), and \({\varvec{q}}\) is a new joint configuration, \({\varvec{q}}={\varvec{q}}(t)\) with \(t>t_0\). Hence,

$$\begin{aligned} \varDelta {\varvec{f}}_{i}^{ext} \approx -{\varvec{K}}_{c_i} {\varvec{J}}_{c_i}({\varvec{q_0}})({\varvec{q}} - {\varvec{q_0}}) \end{aligned}$$
(6)

and

$$\begin{aligned} {\varvec{f}}_{i}^{ext} \approx {\varvec{f}}_{i}^{measured} - {\varvec{K}}_{c_i}{\varvec{J}}_{c_i}({\varvec{q_0}})\left( {\varvec{q}} - {\varvec{q_0}}\right) \end{aligned}$$
(7)

where \({\varvec{f}}_{i}^{measured}\) is the force normal to the robot arm’s surface measured at contact i for every time step of the controller.

Combining the predicted contact forces, \({\varvec{f}}_{i}^{ext}\), with Eq. 3, gives the following prediction for the external torque, \({\varvec{\tau }}_{ext}\):

$$\begin{aligned} {\varvec{\tau }}_{ext} \approx \varSigma _{i=0}^{N}{\varvec{J}}_{c_i}^{T}({\varvec{q}})\big ({\varvec{f}}_{i}^{measured} - {\varvec{K}}_{c_i}{\varvec{J}}_{c_i}({\varvec{q_0}})\left( {\varvec{q}} - {\varvec{q_0}}\right) \big ) \end{aligned}$$
(8)

The controller also regulates the change in the contact forces using the following signed scalar:

$$\begin{aligned} \varDelta f_{i}^{ext} = \hat{{\varvec{n}}}_{c_i}^T\varDelta {\varvec{f}}_{i}^{ext} \approx -\hat{{\varvec{n}}}_{c_i}^T {\varvec{K}}_{c_i}{\varvec{J}}_{c_i}({\varvec{q_0}})\big ({\varvec{q}} - {\varvec{q_0}}\big ) \end{aligned}$$
(9)

3.2.2 Linear robot model

For our dynamic robot model, we neglect joint friction (\({\varvec{F}}({\varvec{\dot{q}}}) = 0\)). If viscous joint friction were significant, we could incorporate it into our model of joint damping in the simple impedance controller.

We combine Eqs. 12 and 8, and rearrange them to obtain a state-space representation which gives the following:

$$\begin{aligned} \left[ \begin{array}{c} {\varvec{\ddot{q}}}\\ {\varvec{\dot{q}}} \end{array}\right]= & {} {\varvec{A}}\left[ \begin{array}{c} {\varvec{\dot{q}}}\\ {\varvec{q}} \end{array}\right] +{\varvec{B}}\left[ \begin{array}{c} \begin{array}{c} {\varvec{q}}_{eq}\end{array}\\ \sum \limits _{i=1}^{N}{\varvec{J}}_{c_i}^{T}({\varvec{q}}){\varvec{f}}_{i}^{measured}\\ {\varvec{q}}_0 \end{array}\right] \end{aligned}$$
(10)

where

$$\begin{aligned} {\varvec{A}}&= \left[ \begin{array}{cc} {\varvec{A_{11}}} &{} \quad {\varvec{A_{12}}}\\ {\varvec{I}} &{} \quad {\varvec{0}} \end{array}\right] \end{aligned}$$
(11)
$$\begin{aligned} {\varvec{A_{11}}}&= -{\varvec{M}}({\varvec{q}})^{-1}({\varvec{K}}_{d}+{\varvec{C}}({\varvec{\dot{q}}},{\varvec{q}})) \end{aligned}$$
(12)
$$\begin{aligned} {\varvec{A_{12}}}&= -{\varvec{M}}({\varvec{q}})^{-1}\left( {\varvec{K}}_{p}+\sum \limits _{i=1}^{N}({\varvec{J}}_{c_i}^T({\varvec{q}}) {\varvec{K}}_{c_i} {\varvec{J}}_{c_i}({\varvec{q_0}}))\right) \end{aligned}$$
(13)
$$\begin{aligned} {\varvec{B}}&= {\varvec{M}}({\varvec{q}})^{-1} \left[ \begin{array}{ccc} {\varvec{K}}_{p} &{} \, {\varvec{I}} &{} \, \sum \limits _{i=1}^{N}({\varvec{J}}_{c_i}^T({\varvec{q}}) {\varvec{K}}_{c_i} {\varvec{J}}_{c_i}({\varvec{q_0}})) \\ {\varvec{0}} &{} \,{\varvec{0}} &{} \,{\varvec{0}} \end{array}\right] \end{aligned}$$
(14)

The matrix \({\varvec{I}}\) is an m by m identity matrix, where m is the number of joints. \({\varvec{M}}({\varvec{q}})\) is, by definition, a positive definite matrix and therefore invertible. We derived the symbolic form of the mass matrix, Coriolis, and gravity terms using a symbolic Python library (see Sousa 2014). Our linear model approximates \({\varvec{M}}({\varvec{q}})\), \({\varvec{C}}({\varvec{q}}, {\varvec{\dot{q}}})\) and all contact Jacobians (\({\varvec{J}}_{c_i}({\varvec{q}})\)) as constants for the short time horizon of length H over which dynamic MPC makes predictions, such that:

$$\begin{aligned}&{\varvec{M}}({\varvec{q}}(t)) \approx {\varvec{M}}({\varvec{q}}(t_0)) \quad t_0 \le t \le t_0+H \end{aligned}$$
(15)
$$\begin{aligned}&{\varvec{C}}({\varvec{q}}(t), {\varvec{\dot{q}}}(t)) \approx {\varvec{C}}({\varvec{q}}(t_0), {\varvec{\dot{q}}}(t_0)) \quad t_0 \le t \le t_0+H \end{aligned}$$
(16)
$$\begin{aligned}&{\varvec{J}}_{c_i}({\varvec{q}}(t)) \approx {\varvec{J}}_{c_i}({\varvec{q}}(t_0)) \quad t_0 \le t \le t_0+H \end{aligned}$$
(17)

The variable H is a continuous amount of time and can be defined such that \(H = (H_u+H_y)\varDelta t_d\). \(\varDelta t_d\) is the duration of the continuous time step used for the discretization of our prediction model. Using the approximations of Eqs. 15 through 17, we can discretize our system with the matrix exponential (see Brogan 1991). This entails performing the following calculation:

$$\begin{aligned} {\varvec{A}}_d&= \textit{e}^{{\varvec{A}}\varDelta t_d} \end{aligned}$$
(18)
$$\begin{aligned} {\varvec{B}}_d&= \left( \int _{o}^{\varDelta t_d}\textit{e}^{{\varvec{A}}\lambda }\textit{d}\lambda \right) {\varvec{B}} \end{aligned}$$
(19)

To approximate the matrix exponential, we use a Padé approximation (see Moler and Van Loan 2003), which has favorable computational performance and stability with respect to other methods. We then formulate the following discrete-time state-space equations:

$$\begin{aligned} \left[ \begin{array}{c} {\varvec{\dot{q}}}[k+1]\\ {\varvec{q}}[k+1] \end{array}\right] = {\varvec{A}}_{d}\left[ \begin{array}{c} {\varvec{\dot{q}}}[k]\\ {\varvec{q}}[k] \end{array}\right] +{\varvec{B}}_{d} \left[ \begin{array}{c} \begin{array}{c} {\varvec{q}}_{eq}[k] +\varDelta {\varvec{q}}_{eq}[k] \end{array}\\ \sum \limits _{i=1}^{N}{\varvec{J}}_{c_i}^{T}{\varvec{f}}_{i}^{measured}[0]\\ {\varvec{q}}[0] \end{array}\right] \end{aligned}$$
(20)

In these discrete-time equations, [k] and \([k+1]\) represent the current and next time steps. The equilibrium angles for the next time step are \({\varvec{q}}_{eq}[k+1]\) with \({\varvec{q}}_{eq}[k+1] = {\varvec{q}}_{eq}[k]+\varDelta {\varvec{q}}_{eq}[k]\). Variables with the time index [0] represents the value for that variable at time \(t_0\). The state-space matrices \({\varvec{A}}_{d}\) and \({\varvec{B}}_{d}\) are constant over the given prediction horizon H. These equations are a linear and discretized approximation of the original nonlinear and continuous dynamics.

Section 6 shows that using a horizon of 5 with dynamic MPC outperformed quasistatic MPC across 4800 simulated trials in terms of speed, success rates and regulation of maximum contact forces. Section 5 shows that our linear model can predict the future state of a robot with low error.

In Sect. 5, we provide empirical support for the fidelity of our linear model, and our evaluations of dynamic MPC demonstrate that this model can be used effectively. The approximations we made result in a computationally efficient formulation that is straightforward to implement. Conventional linearization would require the calculation of numerous derivatives at each time step, which would be complicated by the number of contacts changing over time.

3.2.3 Impulse–momentum Impact Model

One important motivation for introducing dynamics was to control the joint velocities explicitly. Although we could limit all the joint velocities to remain below constant values, this is unnecessarily stringent. There are many configurations where some links can move faster than others without incurring high contact forces during unexpected impact. To capture this property, we used a joint-space impulse model as shown in Featherstone and Orin (2008) to model the dynamics of impacts. Impulse–momentum models have been used for robotic control before, most often for work on walking robots (Grizzle et al. 2001). We express our impulse–momentum model as

$$\begin{aligned} {\varvec{M}}\left( {\varvec{q}}\right) \left( {\varvec{\dot{q}}}^{+} - {\varvec{\dot{q}}}^{-}\right) \approx {\varvec{\tau }}_{impact} \varDelta t_{impact} \end{aligned}$$
(21)

where \({\varvec{\dot{q}}}^{-}\) represents the joint velocities just before impact, \({\varvec{\dot{q}}}^{+}\) represents the joint velocities just after impact, and \(M({\varvec{q}})\) is the joint-space mass matrix. \({\varvec{\tau }}_{impact}\) is the average torque vector due to the force occurring during the collision and \(\varDelta t_{impact}\) is the duration of the impact. The left hand side of Eq. 21 represents the change in momentum due to impact and the right hand side represents average torques operating over a short period with no displacement or work occurring.

For an unexpected impact on a single link, we regulate the resulting joint torque \(\tau _{j,impact}\) for joint j by expressing it as a product of allowable contact force threshold, \(f_{threshold}\), and a moment arm \(d_{j,impact}\), giving

(22)

For all of the work in this paper, \(d_{j,impact}= d_{impact}\) for all j, where \(d_{impact}\) is a constant scalar (see Fig. 4).

With regard to the change in momentum, we assume that in the worst case scenario of a perfectly elastic collision

$$\begin{aligned} {\varvec{\dot{q}}}^{+} = -{\varvec{\dot{q}}}^{-} \end{aligned}$$
(23)

and

$$\begin{aligned} {\varvec{\dot{q}}}^{+} - {\varvec{\dot{q}}}^{-} = 2{\varvec{\dot{q}}} \end{aligned}$$
(24)

We assume that a collision could happen at any time given the current joint velocities. In order to limit the contact forces from a collision to be below the force threshold, \(f_{threshold}\), dynamic MPC uses the following constraint:

$$\begin{aligned} \vert 2{\varvec{M}}({\varvec{q}}){\varvec{\dot{q}}} \vert \le {\varvec{d}}_{impact}f_{threshold}\varDelta t_{impact} \end{aligned}$$
(25)
Fig. 4
figure 4

This is the visualization of the joint-space impulse–momentum constraint for the first joint of a two link arm

With this model of impulse–momentum between a contact force and joint torque, the worst case scenario for impact location for joint j is when \(d_{impact}\) approaches zero. Figure 4 shows a visual representation of how \({\varvec{\tau }}_{impact}\) is calculated for the first joint. From Fig. 4, it is clear that as \(d_{impact}\) goes to zero, \(f_{threshold}\) must go to infinity to give the same average torque. However, letting \(d_{impact}\) go to zero is unnecessarily conservative for formulating a joint velocity constraint. We wish to avoid the frequent occurrence of high impact forces, but we are not designing the controller for the worst case scenario. To achieve good empirical performance, we tuned \(\varDelta t_{impact}\) for a robot while holding \(d_{impact}\) constant. Since \(\varDelta t_{impact}\) and \(d_{impact}\) are both constant scalars that only appear in this equation, this is equivalent to tuning their product.

Our impulse–momentum constraint described in Eq. 25 does not take into account joint compliance. In addition, contact with some link geometries could cause \(d_{impact}\) to be small and result in high forces. However, our evaluations suggest that this collision model is effective in practice. This model also has the advantage of being linear.

3.3 Limits and saturation in robot model

A benefit of using MPC is that the robot can plan over a short time horizon, adhering to physical constraints on state and input variables in addition to other user defined constraints (such as force thresholds using models in Sect. 3.2.1). In terms of physically meaningful constraints, we include limits on physical joint angles and high joint velocities (as presented in Sect. 3.2.3). We also include a model of actuator saturation. This takes one of two forms, either (1) limiting the amount that the desired joint angle can be changed at each time step (see Eq. 34), or (2) limiting the total torque each joint can apply (see Sect. 7.2).

3.4 Dynamic model predictive controller

Using the equations of motion from Eq. 20, along with the contact models and other constraints, we created a model predictive controller that accounts for dynamics (dynamic MPC). A major difference from quasistatic MPC (see Jain et al. 2013), which used a single time step time horizon, is that dynamic MPC has a horizon that consists of multiple time steps. Dynamic MPC uses a horizon of length \(H_u\) during which the controller can apply control effort and a subsequent horizon of \(H_y\) time steps during which the controller applies no control effort. This is a common approach to improve stability and robustness and can be found in Rossiter (2003). The values we used for \(H_u\) and \(H_y\) are presented in relation to specific tests in subsequent sections. The actual physical time that the controller predicts into the future depends on the rate (\(\frac{1}{\varDelta t_d}\)) at which the controller runs. For example, using a total time horizon of ten discrete steps, a time step \(\varDelta t_d=0.01\) s would result in the controller predicting up to 0.1 s into the future, and would imply that the controller expects to run at 100 Hz.

The procedure for MPC is to define an optimization problem with a cost function and constraints that are functions of the state variables and that include a model of the system dynamics as an equality constraint. After solving the current convex optimization problem to find a sequence of commanded changes to the equilibrium angles, \(\varDelta {\varvec{q}}_{eq}[k] \) for \(k = 0, 1, \ldots , H_{u}+H{y}\), the controller only uses \(\varDelta {\varvec{q}}_{eq}[0]\) before solving a new convex optimization problem based on updated information. Equations 26 through 35 show the complete optimization problem that dynamic MPC creates and solves at each time step.

$$\begin{aligned}&\underset{\varDelta {\varvec{q}}_{eq}}{{\mathbf {minimize}}} \nonumber \\&\quad \alpha \left\| \varDelta {\varvec{x}}_{des}-{\varvec{J}}_{ee} \left( {\varvec{q}}[H_u+H_y+1]-{\varvec{q}}[0]\right) \right\| ^{2} \end{aligned}$$
(26)
$$\begin{aligned}&\quad + \beta \sum \limits _{k=0}^{H_u}\sum \limits _{i=1}^{N}\max \Big (\hat{{\varvec{n}}}^{T}_{c_i}{\varvec{K}}_{c_i}{\varvec{J}}_{c_i}({\varvec{q}}[k+1]-{\varvec{q}}[0])\nonumber \\&\quad - \left( f_{threshold}-\left\| {\varvec{f}}_{i}^{measured}[0]\right\| \right) , {\varvec{0}}\Big ) \end{aligned}$$
(27)
$$\begin{aligned}&\quad +\zeta \sum \limits _{k=0}^{H_u}\sum \limits _{i=1}^{N}\max \nonumber \\&\quad \times \Big (\vert \hat{{\varvec{n}}}^{T}_{c_i}{\varvec{K}}_{c_i}{\varvec{J}}_{c_i}({\varvec{q}}[k+1]-{\varvec{q}}[k])\vert - \varDelta f_{rate,i}, {\varvec{0}}\Big ) \end{aligned}$$
(28)
$$\begin{aligned}&\quad + \mu \sum _{k=0}^{H_u}\left\| \varDelta {\varvec{q}}_{eq}[k]\right\| ^{2} \end{aligned}$$
(29)
$$\begin{aligned}&\mathbf{subject~to}:\,\, (\forall k=0,1,\ldots ,H_u+H_y) \nonumber \\&\quad \left[ \begin{array}{c} {\varvec{\dot{q}}}[k+1]\\ {\varvec{q}}[k+1] \end{array}\right] = {\varvec{A}}_{d}[k]\left[ \begin{array}{c} {\varvec{\dot{q}}}[k]\\ {\varvec{q}}[k] \end{array}\right] +{\varvec{B}}_{d}[k] \left[ \begin{array}{c} \begin{array}{c} {\varvec{q}}_{eq}[k]+\varDelta {\varvec{q}}_{eq}[k]\end{array} \\ \sum \limits _{i=1}^{N}{\varvec{J}}_{c_i}^{T}{\varvec{f}}_{i}^{measured}[0]\\ {\varvec{q}}[0] \end{array}\right] \nonumber \\ \end{aligned}$$
(30)
$$\begin{aligned}&{\varvec{q}}_{eq}[k+1] = {\varvec{q}}_{eq}[k]+\varDelta {\varvec{q}}_{eq}[k] \end{aligned}$$
(31)
$$\begin{aligned}&{\varvec{q}}[k+1] \leqq {\varvec{q}}_{max} \end{aligned}$$
(32)
$$\begin{aligned}&{\varvec{q}}[k+1] \geqq {\varvec{q}}_{min} \end{aligned}$$
(33)
$$\begin{aligned}&\vert \varDelta {\varvec{q}}_{eq}[k]\vert \leqq \varDelta {\varvec{q}}_{max,eq} \end{aligned}$$
(34)
$$\begin{aligned}&\vert 2{\varvec{M}}({\varvec{q}}){\varvec{\dot{q}}}[k+1]\vert \leqq d_{impact} f_{threshold} \varDelta t_{impact} \end{aligned}$$
(35)

Our cost function for dynamic MPC consists of a terminal cost (Eq. 26) which attempts to move the end effector towards a desired goal position, a cost on non-adhesive forces above a desired force threshold (Eq. 27), a cost on changing the contact force faster than a specified rate (Eq. 28), and a cost on control effort (Eq. 29). Notably, Eq. 27 does not penalize predicted adhesive forces nor predicted contact forces with magnitudes below \(f_{threshold}\). Non-adhesive forces with magnitudes above \(f_{threshold}\) incur a linear penalty.

In our original formulation, we defined constraints on the allowable contact force. However, due to feasibility issues with convergence of the optimization, we removed these constraints and added terms related to contact forces to the cost function. The constraints in our current formulation consist of the discrete dynamic equations (Eqs. 30 and 31), limits on joint actuation and angles (see Eqs. 32, 3334) and the joint velocity constraint described in Sect. 3.2.3, (see Eq. 35). The actuation limits described in Eq. 34 are for the simulation testbed. For the full-sized real robot DARCI, this actuator model takes a different form to limit the torques applied at the joints. This form is described in Sect. 7. The variable \(\varDelta x_{des}\) is a waypoint that is a fixed step size in a straight line towards the target position unless the distance to the target position is smaller than the nominal step size. Equation 35 describes the impulse–momentum constraint from Sect. 3.2.3. The result of this optimization at each time step is a series of \(\varDelta {\varvec{q}}_{eq}\) values from which the controller executes only the first.

We used an optimization tool named CVXGEN to generate efficient C code that solves this optimization problem (see Mattingley and Boyd 2009, 2010, 2012 for details about this tool for web-based convex optimization code generation). Our system uses Python code to generate the data required by this C code. For the average number of contacts experienced during our trials, this Python code executed in around 1–4 ms. The C code solver requires arguments with fixed dimensions. It solved this optimization problem with multiple contacts and a multi-step horizon in around 4–10 ms, depending on the number of degrees of freedom of the robot being controlled.

3.5 Evaluation of controller

Our strategy in evaluating our controllers is based on empirical measures of success at reaching a specified goal in clutter while keeping the contact forces low. Due to the nature of clutter, robots will not always succeed at reaching a target and the contact that occurs can vary widely. To represent this variability, we randomly generated simulated environments and representative real physical environments to represent the complexity of real clutter. We then used success rates, contact forces, and time-to-complete to evaluate dynamic MPC and compare it with our prior work. Details of quasistatic MPC, which we use as a base-line comparison in simulation can be found in Jain et al. (2013) and Killpack (2013).

Infrastructure and testbeds that we used to test our controller include two different software simulation testbeds, and a human-scale robot with torso, mobile base, and two 7 degree-of-freedom arms. Section 4 describes these testbeds in detail.

4 Testbeds and infrastructure

4.1 Software simulation testbeds

We developed two simulation testbeds for controller development and testing. We first performed tests using the MATLAB Robotics Toolbox (Corke 1996) which gives more control over the numerical integration methods and the contact models that are used (for more details see Killpack 2013; Killpack and Kemp 2013). We also used the Open Dynamics Engine (ODE Smith et al. 2011), which is an open source physics SDK. Both simulations included a simple joint impedance controller running at 1 kHz. They also both simulated the same three link arm described in Jain et al. (2013), Killpack (2013), and Killpack and Kemp (2013). This arm has kinematics, mass and joint limits similar to a human holding a hand outstretched and manipulating in a plane parallel to the ground at shoulder height (see Fig. 5). The three joints had stiffnesses of 30, 20, and 15 N-m/rad, and damping values of 15, 10, 8 N-m-s/rad, which correspond to the torso, shoulder, and elbow joints.

4.1.1 MATLAB simulation using robotics toolbox

We used our MATLAB simulation testbed to prototype model predictive controllers with the dynamic models presented in Sect. 3.4. This higher fidelity simulation was valuable for testing the fidelity of our dynamic model as we found that although ODE was better at simulating large numbers of contact, it had lower numerical stability and integration accuracy (see Killpack 2013).

Our MATLAB implementation used an explicit spring-damper model for simulating contact. Since there is no native geometry collision library with MATLAB, we also simulated discrete tactile sensing elements (taxels), that were spaced one centimeter apart along the center of each link to which our implementation assigned any simulated contact force. All objects were assumed to have a 1.5 cm radius and a spring-damper contact was simulated when the extent of the arm (which was assumed to have a 1.5 cm radius) was within the extent of the object. Objects were simulated as having a stiffness of 5000 N/m and damping value of 10 N-s/m for the tests in Sect. 5.

4.1.2 Open source dynamics engine (ODE)

The ODE software testbed allowed us to simulate a large number of trials and prototype our controllers in highly cluttered workspaces. This was particularly beneficial as we could generate large data sets to test controller performance.

For this platform we simulated tactile sensors covering the entire surface of the arm with the same default density of 100 taxels per meter as in MATLAB. Other specifics (e.g. taxel assignment for each contact force, link geometry, obstacle properties) about our ODE implementation are included in Killpack and Kemp (2013). Figure 5 shows a visualization of the simulated robot, force sensing taxels as well as an example of our simulated environments composed of fixed, rigid cylindrical obstacles (red circles).

Fig. 5
figure 5

Visualization of our simulated robot arm showing the similarity in terms of kinematics to a human reaching in clutter. For the ODE software testbed, this shows the three link planar arm reaching to a goal location (cyan) in a volume consisting of rigid cylinders that are fixed (red). The base joint of the arm is rigidly fixed to the world. The orange points on the arm are 1 cm apart and represent the centers of each tactile sensor. The green arrows are the contact force vectors and each red arrow is the component of the contact force normal to the surface of the arm which our sensor can measure (Color figure online)

Fig. 6
figure 6

Mobile manipulator DARCI produced by Meka Robotics. DARCI has a tactile sensing sleeve which has an external blue covering that can be seen on the left arm (Color figure online)

4.2 Real robot platform: DARCI

We used a humanoid mobile manipulator called DARCI, a Meka M1 Mobile Manipulator (see Fig. 6). DARCI has two 7 degree-of-freedom arms that have series-elastic actuators (SEAs), are torque controlled and include gravity compensation. Joint stiffness and damping gains for the simple joint impedance controller are 43, 43, 43, 43, 2.6, 3.4, 3.4 N-m/rad and 2.6, 4.3, 0.64, 0.64, 0.064, 0.090, 0.090 N-m-s/rad where they are listed from the most proximal to the most distal joint. The first three values in each list correspond to the shoulder, the fourth value to the elbow, and the last three values to the wrist. We used ROS to send commanded joint angles (\({\varvec{q}}_{eq}\)) to the low-level impedance controller. The computer we used to run our MPC solver in real time had a 32-bit Ubuntu operating system with 16 GB of RAM and a 3.40 GHz Intel Core i7-3770 CPU. The solver only used a single core, running as a single process.

4.3 Tactile sensing hardware

We describe specifics of our tactile sensor implementation in Bhattacharjee et al. (2013). The tactile sensor we used on DARCI is a sleeve pulled over the end effector, wrist, and forearm, that consists of five layers of fabric. An inner and outer layer that protect and insulate the sensor’s interior, and two layers of conductive fabric that go on either side of a layer of electrically resistive fabric that decreases resistance as the pressure on the cloth increases. One of the conductive layers is split into separate rectangles; where each rectangle is a taxel. The sleeve has 25 taxels total consisting of a single taxel at the tip of the end effector and an array of 24 taxels with 4 taxels around the arm’s circumference and 6 along the arm’s length. When a taxel detects contact, the system uses the geometric center of the taxel as the location of contact. Figure 6 shows the tactile sensor on the left arm of DARCI.

We calibrated the tactile sensor in order to convert the raw sensor measurements to forces in Newtons. We fit an exponential curve to a plot of tactile sensor readings versus ground truth from a force–torque sensor when pushing on a taxel with various forces. We used the calibration curve from one taxel on all taxels. Although the values reported by these sensors vary in complex ways based on other parameters such as contact area, pressure, and hysteresis, when quasistatic MPC (see Bhattacharjee et al. 2013) and dynamic MPC (see Sect. 8) have used these values they have also performed well with respect to ground-truth forces.

In this paper, our systems detected contact when a tactile sensor’s measurements exceeded a threshold. This threshold was 0.5 N for the simulated robot and 0.2 N for the real robot.

5 Dynamic model prediction accuracy

The performance of MPC is limited by the quality of the model used to make predictions. For dynamic MPC, it uses a linear approximation of the nonlinear robot dynamics. This approximation assumes that over the prediction horizon changes to the mass and Coriolis matrices are negligible and can thus be treated as constants (see Sect. 3.2.2). In this section we examine the accuracy of our simplified dynamic model for the simulated three-link planar arm in MATLAB as described in Sect. 4.1.1.

5.1 Testing accuracy of open loop dynamic model prediction

We first randomly generated 10 different initial joint configurations that were within the physical joint limits of the robot arm. We then randomly generated 5 different goals that were within the same workspace dimensions as our ODE simulation tests in Sect. 6 and in Jain et al. (2013). Finally we also randomly generated 20 fixed objects within that same workspace. The initial joint configurations (differently colored linkages), the goal positions (green stars) and fixed objects (red circles) are shown in Fig. 7. For each initial configuration we ran two sets of tests. For the first set we removed all the objects so the arm reached in free space, while for the second set we used the objects and the arm made contact.

Fig. 7
figure 7

This graph shows the initial joint configurations for 10 different starting positions for a three link planar arm. Additionally, the green stars are the goals that the arm reached to for each starting configuration and red circles are fixed objects (Color figure online)

For both sets of trials, at every time step, we solved the model predictive control optimization problem. This solution results in a control sequence over the control horizon (\(H_u\)) as well as a prediction of how that control sequence will change the joint velocities and angles over that same horizon. At each time step we simulated the motion of the arm using our linear model as if we were to apply all of the control steps over the entire control horizon instead of just the first step. Then, using the full nonlinear equations, we also simulated and compared the actual change in joint angles and velocities to the change predicted by our linear dynamic model. After this simulation over just the horizon (\(H_u\)), we would reset the state of the arm as if we had only taken the first control step and repeat the process for the next time step. This means that as the arm progressed towards the goal using MPC (executing only the first control input), we also simulated with both our linear and nonlinear models five steps into the future to check the error in our linear prediction model.

In order to quantify the error of the prediction model, we group measurements from all three joints and looked at statistics on the error from the actual and predicted joint angles across the horizon of 5 control steps (\(H_u = 4\) since the implementation starts with an index of 0).

5.1.1 Prediction of arm moving in free space

We first looked at the median of the absolute value of prediction errors over each of the five time steps in the horizon for the arm moving in free space. We plot this in Fig. 8. The error bars also show the 25th and 75th percentile error values. As expected, the error increases for predictions farther in the future. However, this does not convey the relationship between the magnitude of the prediction error and the magnitude of the actual change in the joint angles. In Fig. 8, we see the median of the actual change in the joint angles (in red) as compared to the median of the error (in blue). We can see that the median error is much smaller than the median change in the joint angles.

Fig. 8
figure 8

Top: This plot shows how the median error of the joint angle prediction changed with each step in the horizon. The error bars are the 25th and 75th percentile error. Bottom: This plot shows a comparison between the magnitude of the median joint angle error versus the median change in joint angle at every step in the horizon (Color figure online)

Another way to examine the error is to look at plotting the change in the actual joint angle versus the predicted change from one step to the next. If our prediction is perfect, then plotting the actual change versus the predicted change should be a line with a slope of one. In Fig. 9 we took the data from movement in free space for all three joints across all of the predictions for only the first and fifth steps in the horizon and plotted the actual change in joint angles against the predicted change in joint angles as scatter plots. The red dotted line is the ideal relationship (one to one). The plot in Fig. 9 is for all of the data from the first step in the horizon and the plot in Fig. 9 is for the fifth step in the horizon. We can again see that the prediction model performs well for free space motion.

Fig. 9
figure 9

This is a scatter plot for free space motion showing the actual change in joint angles versus the predicted change. The red dotted line is the ideal if our prediction was perfect. Top: The data for the first step in the horizon. Bottom: The data for the fifth step in the horizon (Color figure online)

5.1.2 Prediction of arm moving in clutter with contact

We ran the same tests for the same initial joint configurations but this time included fixed objects so that the arm would make contact while reaching. For this test, in order to evaluate the prediction model, the simulation again executed the entire open loop control sequence over the horizon (5 steps). Note that for our normal implementation of the controller, when contact occurs the contact is included at the next time step when the optimization is reformulated. However, for this test of our linear model, when the robot first makes or breaks contact our model will be inaccurate for however many steps are left in the open loop control horizon the robot is executing. We again show the median of predicted errors and the direct comparison between the median of the prediction error and the median of the change for the joint angles in Fig. 10.

Fig. 10
figure 10

Top: If we allow the arm to make contact, this plot shows how the median error of the joint angle prediction changed with each step in the horizon. The error bars are the 25th and 75th percentile error. Bottom: This shows a comparison between the magnitude of the median joint angle error versus the median change in joint angle at every step in the horizon

The error in the prediction is higher relative to the actual change in joint position than it was for movement in free space. However, the error is still much smaller than the actual change.

We show the scatter plot of the actual change in the joint angle versus the predicted change for movement with contact in Fig.  11.

Fig. 11
figure 11

This is a scatter plot for when the arm is allowed to make contact, and shows the actual change in joint angles versus the predicted change. The red dotted line is the ideal if our prediction was perfect. Top: The data for the first step in the horizon. Bottom: The data for the fifth step in the horizon (Color figure online)

The error in the prediction is more apparent when moving in contact than when moving in free space. Although the error and the variance are larger at the the end of the horizon (step 5), when dynamic MPC is actually running it updates its contact information at every time step, which would reduce errors due to contact being made or broken.

In this section, we have shown that using our dynamic model, the change in predicted joint angles correlates strongly with the actual change in the joint angles. We also have shown that the error in our predictions due to our linear approximations is small relative to the actual change in the robot state.

6 Dynamic versus quasistatic MPC for reaching in simulated clutter

In this section we compare dynamic MPC against quasistatic MPC from Jain et al. (2013) in ODE using the testbed shown in Fig. 5.

In order to compare the performance of our controllers, we ran four sets of 1,200 trials each in ODE for our dynamic controller and for quasistatic MPC. We generated these four sets by varying the density of clutter between 20 fixed objects and 80 fixed objects, and the value of the force threshold between 5 and 25 N. For quasistatic MPC, we used the same data for the 5 N threshold that was reported in Jain et al. (2013). However, for the 25 N threshold, we also generated new results for quasistatic MPC.

Our initial attempts at tuning the parameters of dynamic MPC involved setting the gain on the position cost term first (Eq. 26) and then trying to manually vary the other parameters to achieve desirable performance. However, our multi-objective cost function along with other tunable parameters such as the waypoint magnitude size made tuning the controller difficult and sometimes unintuitive. In order to search the parameter space for our controller, we used simulated annealing on 15 randomly selected environments in ODE (5 with 20 fixed objects and 10 with 80 fixed objects) that were not part of the data set we used for evaluating the controllers. We ran these trials with two different maximum desired force thresholds of 5 and 15 N, and accumulated the cost across all 30 trials. Our cost function for simulated annealing was based on four terms: a cost on the time to complete a trial; a cost on forces over the threshold; a cost on the maximum force in a trial; and a cost on failure to reach the goal. This optimization resulted in a Pareto front that gave us a better intuition for the trade-off between control parameters while at the same time improving performance significantly. We tuned the weightings between the cost terms such that the trade off (especially between costs on force and time to complete) caused the optimization to more equally explore the Pareto front while searching mostly in the space where success rates remained on the same order of magnitude. More details on this tuning with a nine dimensional parameter space are discussed in Killpack (2013).

For quasistatic MPC, we used the nominal tuning from Jain et al. (2013) which is one limitation of our comparison since further tuning might improve performance.

For all of the simulation trials we set the control horizon \(H_u = 4\) and prediction horizon \(H_y = 1\) according to their description in Sect. 3.4. The parameter \(d_{impact}\) as described in Sect. 3.2.3 was set to 0.02 m for all joints which is the diameter of a single cylinder from our simulation testbed. The task for all simulations was specified as reaching to a goal location and we used the same stopping criterion for success or failure as used in Jain et al. (2013), and Killpack and Kemp (2013). The trial was deemed a success if the arm reached the goal within 2 cm. It was deemed a failure if the max forces were greater than 100 N or the time took longer than 100 s.

We used three main criteria for comparing dynamic MPC, which uses multiple time steps and a dynamic model, to the single-step quasistatic MPC. The first was overall success rate of reaching to a goal position through the simulated clutter (see Sect. 6.1.1). The second criterion was comparing the ability of the two controllers to keep their contact forces below or near the specified desired force threshold (see Sect. 6.1.2). The third criterion was the average time to complete the task for the intersection of successful trials for the two controllers (see Sect. 6.1.3). Table 1 summarizes the results of these comparisons.

Fig. 12
figure 12

These graphs shows how the percent of overall contact forces above the force value in the x-axis decrease as that force value increases. The graphs are organized in columns and rows such that the left column corresponds to low clutter (20 objects), and the right column to high clutter (80 objects). The top row corresponds to a high allowable contact force threshold (25 N) while the lower row corresponds to a low force threshold (5 N)

Table 1 Summary statistics for comparing success rate, contact force control and time to reach the goal between dynamic and quasistatic MPC at different densities of clutter and for different force thresholds

6.1 Comparison results from ODE simulation

6.1.1 Success rates

Dynamic MPC achieved equal or higher success rates than the quasistatic MPC for all force threshold and clutter settings. In addition, we used a standard significance test for comparing two proportions (McClave et al. 2008) and showed that for two of the four test settings, dynamic MPC’s higher success rate was statistically significant with a p-value less than 0.05.

6.1.2 Contact force regulation

For the 99th percentile contact force value (meaning that 1/100 forces measured above the noise threshold was this high or higher), the results were comparable for the two controllers. Figure 12 shows the calculated percentage of contact forces in the y-axis that were above the force value shown on the x-axis for both clutter and force threshold settings. Both controllers are able to successfully affect the measured contact forces according to the force threshold specification. Another way to compare the controllers is to look at events that occur comparatively infrequently, but can have catastrophic effects depending on the specific application. One example of this is extremely high forces. Table 1 shows the value for forces that occur at least once in 1000 force measurements (which at 100 Hz sampling rate is occurring every 10 s that the arm is in contact) as well as the maximum overall measured force. For these two forms of evaluation, dynamic MPC does better in all four settings (high/low clutter and high/low force threshold).

6.1.3 Comparing speed of robot end effector

Table 1 also shows that in all cases dynamic MPC completes the task faster than quasistatic MPC. The speed increase using dynamic MPC for the task ranges from 1.45 to 2.04 times faster than quasistatic MPC. This means that while dynamic MPC is at times moving up to twice as fast as quasistatic MPC, it is still successfully controlling the force and reaching the goal as seen in the previous sections.

6.2 Limitations of software simulation results

For the simulation results in this section, we have reported results on fixed, stiff objects only. This is in part because fixed, stiff objects initially proved more problematic for controlling the forces using a dynamic model. However, we noticed that for the movable objects that we had used in Jain et al. (2013), dynamic MPC also moved too slowly and conservatively. This is likely due to the contact model where we used a fixed, high stiffness value to represent each contact. Interestingly, in the experiments that we conducted with the real robot, DARCI, the controller performed in foliage environments successfully moving leaves and branches (see Sect. 7). This may indicate that this issue is less important in real world scenarios.

7 Results for global task of reaching in clutter using the dynamic model predictive controller with a real robot

In this section we present results using dynamic MPC on the real robot DARCI. We show the ability of dynamic MPC to reach randomly generated goal locations in the clutter. We also show that according to the tactile sensor measurements, the arm is able to control its forces.

7.1 Cluttered environment used for testing

The environment that we used for testing reaching in clutter was the same as in Bhattacharjee et al. (2013). This environment consists of plastic leaf-like vegetation and solid wooden trunks as can be seen in Fig. 14. Certain sections of the workspace with leaf-like vegetation can be difficult to push through if the robot respects an allowable contact force threshold. This is due to the density and stiffness of the plants. Figure 14 also shows the coordinate frame used, which is centered between the two robot arms and oriented as shown.

7.2 Dynamic MPC adaptation for DARCI

Adapting the controller used in simulation in Sect. 6 to run on DARCI involved a few changes. We used the same controller parameters that we identified for simulation except for \(\varDelta t_{impulse}\) and the waypoint magnitude size. These we tuned empirically by first setting the waypoint magnitude size without the joint velocity constraint being active. Then, we tuned \(\varDelta t_{impulse}\) until the impact forces, as measured by our tactile sensing skin, were close to the allowable force threshold.

We also used a different actuation constraint than the one described in Eq. 34. The new constraint describes a limit on the allowable joint torque as follows:

$$\begin{aligned} {\varvec{\tau }}_{min} \le {\varvec{K}}_{p} \left( {\varvec{q}}_{eq}[k]+\varDelta {\varvec{q}}_{eq}[k] -{\varvec{q}}[k]\right) -{\varvec{K}}_{d}{\varvec{\dot{q}}}[k] \le {\varvec{\tau }}_{max} \end{aligned}$$
(36)

Because we used software provided with the robot and relied on the high-bandwidth, real time, low-level impedance control to do gravity compensation, gravity was not explicitly modeled in our dynamic model. The provided gravity compensation had significant errors that we compensated for by changing one term in the overall cost function from Eq. 26 to be as follows:

$$\begin{aligned} \alpha \left\| \varDelta {\varvec{x}}_{des}-{\varvec{J}}_{ee} \left( {\varvec{q}}[H_{u} + H_{y} + 1]-{\varvec{q}}[0]\right) - {\varvec{d}}_{grav}\right\| ^{2} \end{aligned}$$
(37)

where \({\varvec{d}}_{grav} \in {\mathbb {R}}^{3}\) acts as an integral term with saturation limits and anti-windup. The method for calculating \({\varvec{d}}_{grav}\) is described in Algorithm 1 where \({\varvec{e}}_{o}[k] \in {\mathbb {R}}^{3}\) is the error between the desired and current end effector position at the current time step, and \(x_{th}\) is a user-defined scalar distance from the desired goal position. The variable \(k_i\) is small and \(d_{grav}\) is evaluated once at each time step. Finally the “for” loop shown in Algorithm 1 checks if the sign is different between the sum of the error and the current error for each Cartesian direction. If the sign is different, that term is set to zero to avoid overshoot due to the integral term.

figure d

For this controller, we set the control horizon \(H_u = 2\) and prediction horizon \(H_y = 3\) which gives three time steps for control (since \(H_u\) begins at zero) and then predicts the output for another four steps (since the controller predicts the state up to \(H_y + 1\)). The convex optimization problem was generated by CVXGEN (Mattingley and Boyd 2012). Although the optimization could solve an MPC problem at a rate between 50 and 100 Hz, we ran the controller at 25 Hz with 40 ms time steps. We did this because jitter in the control loop due to using a non-realtime operating system with Python and ROS required that the controller run at a lower rate to get consistent timing in the control loop. Note that as long as the dynamic model used in MPC does not go unstable during our experiments because of the size of the time step, our discretization incorporates the time step into the model implicitly. For testing on DARCI, we also moved the constraint that describes our impulse–momentum model (see Eq. 35) into the cost function because noise on our joint velocity signal was causing the optimization to be infeasible when any joint was operating near its constrained joint velocity value.

7.3 Controller evaluation for reaching in clutter

To generate goal locations for reaching in the simulated foliage with DARCI, we first estimated the workspace of the arm in the foliage. We extended the arm as far as possible forward in the x-y plane (according to Fig. 14) while moving through the workspace from left to right and recording the end effector positions. This traced out an approximate semi-circle that covered most of the foliage. We then randomly sampled from a uniform distribution over x and y using this approximate semi-circle in the x-y plane while also randomly sampling from a uniform distribution in z between 5 and 30 cm above the ground plane of our testbed in the z-direction to determine goal locations. For testing, the robot reached into the clutter and recorded data for 105 reaches. Figure 13 shows DARCI reaching into the foliage. Extension 1 is a video that shows multiple successful and unsuccessful reaches. For each trial, the arm attempted to reach the goal for 20 s before classifying the attempt as success or failure. Success was determined as when the end effector was within 4 cm (or approximately 1.5 inches) from the desired goal.

7.4 Results in terms of success rates, contact forces and speed

The success rate for reaching goals was 85.7 % (90 out of 105 trials). Over 95 % of the successful trials where the arm reached the goal within 4 cm, happened in under 7 s which was well before the maximum allowable time of 20 s. Figure 14 shows the distribution of the goals and the end-effector starting location.

Fig. 13
figure 13

Sequence of images from one trial out of 105 trials where DARCI reached into simulated foliage while controlling estimated contact forces using dynamic MPC

Fig. 14
figure 14

Top: Distribution of goals (green) for the 105 reaching trials as well as the starting end effector position for all trials (blue). Bottom: Image showing DARCI with its arm partially extended. The farthest goals from the starting position in the figure on the left were when the arm was almost fully extended (Color figure online)

The average time to complete for all of the successful trials was 3.0 s (noting that the end effector traveled an average distance of 35 cm across these same trials looking at the total end effector path traveled). The average velocity at the end effector for successful trials was 22.2 cm/s and for all trials was 12.5 cm/s. In these trials we found that the end effector moved on average 7.5 times faster for successful trials than velocities reported for quasistatic MPC on the robot Cody (Jain et al. 2013).

The average force as sensed by our tactile sensor for all contact forces above the 0.2 N noise threshold was 1.6 N, while the average force for all forces above the threshold (\(f_{thresh} = 5\) N) was 8.1 N. In terms of high forces, the average maximum contact force per trial across all trials was 0.9 N (since many trials had very low maximum forces) and for contact forces above the 5 N threshold the average maximum force was 8.8 N. A histogram for all contact forces is shown in Fig. 15. There is a sharp drop in the number of forces above the specified 5 N threshold in this histogram. However, there are still a number of forces up to the absolute maximum force sensed which was 13.9 N. In general, we would expect that our max forces when moving faster would be slightly higher. However, with the impulse–momentum constraint we have a way of explicitly trading between maximum forces and joint velocities.

Fig. 15
figure 15

Histogram of measured contact forces across all trials using calibrated tactile sensing skin (Color figure online)

Table 2 summarizes the results and contains the values that are relevant to our task of reaching in clutter with dynamic MPC.

8 Results with local tests for reaching in clutter

Section 7 showed that our system enabled a robot to do fast reaching in clutter while keeping forces low. The trials from previous sections entailed examining a global task of reaching a Cartesian goal location. For this section, we performed tests to analyze the effect of varying the allowable contact force while measuring ground-truth data from force–torque sensors. On the real robot, we performed short trials representative of situations that resulted in high forces or failure in simulation.

Figure 16 shows a typical setup for the robot and the force–torque sensors for these trials. The aluminum rod has a dense foam covering and is attached to the force torque sensor and table using laser cut acrylic. We recorded the data from the force–torque sensors at 100 Hz. For the results we report next, the terms “left” and “right” are in reference to the robot’s frame of reference where left is in the positive y direction according to the coordinate frame in Fig. 14.

Table 2 Results for dynamic MPC reaching in foliage

8.1 Evaluating single contact force control with tactile sensor

In these tests we started the arm on either the left or the right side of a single force–torque sensor post (see Fig. 16). For each trial, the robot reached to a pre-defined goal on the opposite side of the post and made initial contact somewhere along the forearm, wrist or end effector. After making initial contact, the arm continued to try reaching the goal using dynamic MPC. The arm was often successful at reaching from right to left, but from left to right would usually get stuck in a local minimum. We refer to a local minimum as a configuration in which our controller would require a different initial condition to be successful or would require a high-level planner to generate Cartesian way points around the obstacle. Figure 17 shows how this trial was executed. Extension 2 is a video of one trial where DARCI reaches from left to right and then right to left.

Fig. 16
figure 16

Force–torque setup that we used for gathering ground truth contact force data

Fig. 17
figure 17

Sequence of images shows the arm reaching from left to right on top, and right to left on bottom while making contact with the post. a Reaching from left to right. b Reaching from right to left

For these trials we varied the force threshold (\(f_{threshold}\)) between 5, 10 and 15 N. We ran 10 reaches for each direction (left to right and right to left) resulting in a total of 20 trials at each setting. The purpose of these tests was to verify that varying the force threshold (\(f_{threshold}\)) had the expected effect on the real robot.

Figure 18 shows the summary statistics for each set of 20 trials as we held \(\varDelta t_{impulse}\) fixed at 4 s while varying \(f_{threshold}\). We calculated the resultant force from the force–torque measurement and used its magnitude as ground truth for a contact force. The statistics that we plotted include the maximum contact force, and the 99th and 50th percentile contact force for each value of \(f_{threshold}\). The correlation coefficient between the force threshold value and the 99th percentile forces is 0.99975.

As expected, increasing \(f_{threshold}\) for the controller is positively correlated with the maximum and 99th percentile forces that we measured while reaching in contact even if our tactile sensor calibration had some error in terms of ground-truth forces.

8.2 Performance of dynamic MPC for multi-contact scenarios

The local tests and results shown in Sect. 8.1 are for contact with a single object. However, multi-contact scenarios often occur in clutter. Phenomena that can occur during multi-contact and cause high forces include wedging and jamming (see Mason 2001). These situations are similar to what happens to our robot arm, for example when simultaneous contact on both sides of the arm cause the arm to quickly decelerate with high contact forces, at which point jamming can occur.

Fig. 18
figure 18

Maximum, 99th, and 50th percentile forces as \(f_{threshold}\) increased. Each data point represents 20 trials worth of data measuring the ground truth forces with a force–torque sensor at 100 Hz

Fig. 19
figure 19

Multi-contact Test 1: DARCI reaching to the left while starting in contact at the end effector

From a wide range of multi-contact scenarios we found in simulation trials from our previous work (see Jain et al. 2013), we developed a set of multi-contact trials. We examined the histograms of contact locations on the three link simulated arm shown in Fig. 5 and identified contact configurations that occurred frequently or resulted in high force. We then manually confirmed that the multi-contact trials were representative of common configurations by randomly sampling and visualizing the actual contact configurations from the simulation trials represented in the contact histograms. These tests are thus subjectively defined, but based on objective data for thousands of tests in simulation.

For each set of trials, we present a figure showing a typical arm trajectory we observed and force data. The resulting force data from the force–torque sensors was the focus of these tests rather than success in reaching the goal.

For all of these multi-contact trials we used \(f_{threshold} = 5 \mathrm{N}\) and \(\varDelta t_{impulse} = 4 \mathrm{s}\). We ran 20 trials for each multi-contact task. The termination criterion for each trial was successfully reaching the goal or a timeout of 15 s. Note that in these trials it was common for two taxels on our skin sensor to make contact with the force–torque sensor simultaneously (especially at the wrist joint). When this happened, the force–torque sensor value could be up to double the allowed force threshold and our controller would still be successfully controlling skin sensor forces since it was measuring and controlling with respect to two separate contacts.

8.2.1 Multi-contact Test 1

Figure 19 shows the progression of multi-contact test 1 over time. In this test, the robot arm starts in contact the tip of the end effector on the left side and in the middle of the forearm on the right hand side. In the second image in the sequence of Fig. 19, the arm first pushes against both rods as it tries to move to the goal. In subsequent motions the arm uses out of plane motion to move around the post and still get to the goal. This is one of the trials that we expected to have large forces due to what we saw in the simulation. However, the 3D motion and extra degrees of freedom made it straight forward for our controller to reach the goal and control the forces at the same time. See Extension 3 for video of this trial.

The contact force results for this trial are in Fig. 20. These results have forces that agree with our expectation of force control where almost all of the forces are below 10 N and the majority are below 5 N.

Fig. 20
figure 20

Multi-contact Test 1: Force results for reaching to the left

8.2.2 Multi-contact Test 2

Figure 21 shows the progression of multi-contact trial 2. For this test, the arm started in contact with two posts and reached to the right between them. The space between the two posts is much smaller in this trial when compared to the first trial. In this test, the arm could not reach the goal. See Extension 3 for video of this trial.

Fig. 21
figure 21

Multi-contact Test 2: DARCI in contact with two posts in the middle of its forearm and reaching to the right

For the force results of this trial we had higher forces than for the first test. It should be noted that this configuration was more difficult than the first trial as the posts were placed closer together and we expected to have high forces due to jamming in this situation. Most forces were still limited to be below 15 N. This also matches what our test in Sect. 8.1 showed where the 99th percentile contact force for a single contact was about 15 N and corresponded to a 5 N allowable force threshold on the tactile sensor (see Fig. 22).

8.2.3 Multi-contact Test 3

For multi-contact test 3, the arm started in contact near the wrist and was given a goal such that it kept contact with the first force–torque sensor while trying to wedge between two other force–torque sensors to reach the goal shown in Fig. 23. The goal was not reachable due to the small gap between the two distal force–torque sensors. see Extension 3 for video of this trial.

Fig. 22
figure 22

Multi-contact Test 2: Force results for reaching to the right while starting in contact at the middle of the forearm

The contact forces for this trial are in Fig. 24. The majority of the forces are again below 15 N with a maximum force around 25 N.

Fig. 23
figure 23

Multi-contact Test 3: DARCI reaching to a goal between two posts with the gap between them smaller than the diameter of the arm given the angle from which the arm is constrained to approach

Fig. 24
figure 24

Multi-contact Test 3: Force results for reaching straight ahead while wedging between two posts and making contact with a third post

8.2.4 Multi-contact results discussion

Table 3 contains summary statistics of the forces for all of the multi-contact trials. These statistics include the mean of the maximum forces as well as the 99th percentile of all contact forces and the mean of all contact forces for each test.

Table 3 These are numerical values that summarize the data presented in the force histograms for the multi-contact test cases in this section

In prior work (see Bhattacharjee et al. 2013), this type of Meka robot arm applied upwards of 40 N on a force–torque sensor while moving slowly and using the compliance of the simple impedance controller. In our case, it is clear that dynamic MPC is limiting contact forces to be within the same ranges we saw with quasistatic MPC in Bhattacharjee et al. (2013). In many of the trials on DARCI, there was no way to reach the goal, yet dynamic MPC regulated the forces to be generally low despite the fact that as it is noted in Bhattacharjee et al. (2013) “... the relationship between contact forces and taxel output is complex.”

8.3 Discussion of all local results

The results in this section show that the contact force trends we saw with measurements from only the tactile sensors while reaching in foliage (see Sect. 7) hold even when we use force–torque sensors to obtain ground truth.

9 Applications

An open question is the extent to which our controller can be used for other tasks, robots, and environments. We expect that controllers designed for fast reaching with whole-arm contact may eventually serve as general purpose controllers for interaction with people in situations such as in-home assistance or search and rescue.

An anecdotal example of the kind of complex tasks that we envision is shown in the video of Extension 4. Because of the speed of dynamic MPC we can teleoperate DARCI with a Phantom Omni haptic input device to reach into an unmodeled and non-rigid canvas bag with unknown objects inside. The Cartesian position of the haptic input device is measured, scaled and fed to dynamic MPC as a goal position at each time step. The robot is then able to move towards this goal location while locally controlling the contact forces so as to not damage itself or the contents of the bag. The bag is also fixtured at one point to the table with a clamp.

10 Conclusion

We have shown that across a large number of simulated trials, using dynamic MPC while incorporating a linear approximation of the nonlinear dynamics of a robot arm in contact gives significant performance improvements in comparison to quasistatic MPC. In particular, we have shown that for two different settings of clutter and two different force thresholds, dynamic MPC had better success rates, faster speeds and comparable force control. We were also able to run dynamic MPC on the real robot DARCI using the manufacturer’s nominal dynamic parameters (i.e. no complex system identification was necessary). In addition, we have shown that running this controller in Python at around 25 Hz can result in good performance. We would expect that further engineering efforts would allow one to increase the control rate and further improve our force control. A variant of our dynamic MPC has even recently been used successfully in conjunction with haptic mapping, geometric planning, learning and tactile perception (Bhattacharjee et al. 2014).

Our results show the possibility of reaching at faster rates into cluttered environments while controlling velocities, forces, and mitigating the effects of unexpected impact.