A parallel Hamiltonian formulation for forward dynamics of closed-loop multibody systems

This paper presents a novel recursive divide-and-conquer formulation for the simulation of complex constrained multibody system dynamics based on Hamilton’s canonical equations (HDCA). The systems under consideration are subjected to holonomic, independent constraints and may include serial chains, tree chains, or closed-loop topologies. Although Hamilton’s canonical equations exhibit many advantageous features compared to their acceleration based counterparts, it appears that there is a lack of dedicated parallel algorithms for multi-rigid-body system dynamics based on the Hamiltonian formulation. The developed HDCA formulation leads to a two-stage procedure. In the first phase, the approach utilizes the divide and conquer scheme, i.e., a hierarchic assembly–disassembly process to traverse the multibody system topology in a binary tree manner. The purpose of this step is to evaluate the joint velocities and constraint force impulses. The process exhibits linear O(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(n)$\end{document} (n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$n$\end{document} – number of bodies) and logarithmic O(log2n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(\log_{2}{n})$\end{document} numerical cost, in serial and parallel implementations, respectively. The time derivatives of the total momenta are directly evaluated in the second parallelizable step of the algorithm. Sample closed-loop test cases indicate very small constraint violation errors at the position and velocity level as well as marginal energy drift without any additional form of constraint stabilization techniques involved in the solution process. The results are comparatively set against more standard acceleration based Featherstone’s DCA approach to indicate the performance of the HDCA algorithm.


Introduction
In current research and industrial applications, there is a need for fast multibody (MBS) dynamics simulations. One can encounter applications like that in such areas as robotics, biomechanics, and automotive industry. The other places where one needs efficient MBS simulations can be found in various interdisciplinary applications like, e.g., molecular dynamics simulations [1,2] or granular media physics [3]. The applications are more complex and demanding. They may come from different domains and more often require the exploitation of coupling techniques [4]. In this context, numerical formulations for MBS should provide stable, accurate and reliable results. This short literature review is mainly focused on efficient sequential and parallel algorithms for multi-rigid body dynamics. Also the Hamiltonian based formulations are recalled from the MBS simulations point of view.
Due to the advances in the robotics field, there has been a growing attention to the development of efficient, low order algorithms for the simulation of open-loop and closed-loop kinematic chain system dynamics [5][6][7]. It is especially worth noticing the analogies between the optimal filtering theory and robot dynamics [8] that gave foundations for various mass matrix factorizations of a manipulator. The idea of articulated body inertias (ABI) for solving the equations of motion for an n-link manipulator, originally introduced by Featherstone [9], expanded into various forms and flavors. A summary of ABI-like algorithms can be found in [10]. There are also recent advances in the field which can be found in different papers, e.g., [11][12][13]. The mature technology associated with recursive formulations gave basis for further development of efficient low order parallel algorithms.
With the advances in parallel computer architectures, researchers paid more attention to the development of parallel algorithms for multibody dynamics. This trend can be observed in early works in this field, e.g., [14][15][16]. In 1999, Featherstone [17] developed truly optimal-time, logarithmic order divide-and-conquer algorithm (DCA) for the dynamics of tree-like topologies as well as closed-loop multibody systems. Other authors [18] preferred the idea of a decomposition of multibody system into sub-chains. In [19], Critchley and Anderson explored the notion of recursive coordinate reduction. There are other works associated with parallel multibody dynamics simulations as well [3,20,21]. Recently, the divide-and-conquer based schemes [17] have attracted significant attention to the development of efficient parallel algorithms for large MBS, partially due to the fact that computationally powerful multicore processors or graphics processor units are cheaply available on the market. Various algorithms that are effectively based on the Featherstone's DCA are elaborated with myriad of extensions to, e.g., humanoid robot simulations [22], molecular dynamics simulations, real-time applications, discontinuous changes in the system topology [23], sensitivity analysis, constraint enforcement [24][25][26][27], nonholonomic constraints, flexible multibody systems, or generalized constraint treatment. Advances in the application of the DCA approach to large multibody system simulations can be found in [28].
On the other hand, Hamilton's canonical equations constitute an interesting alternative as a basis for elaborating low order formulations. As Hamilton's canonical equations are expressed in terms of velocities and momenta of the system and the possible constraints are usually imposed at the velocity level, many sources indicate profitable characteristics of the Hamiltonian approach from the numerical point of view. Various authors exploited this approach for real-time simulations of vehicles [29], in the analyses of MBS subjected to intermittent motion [30], and in the development of penalty methods for constrained mechanical system dynamics [31]. Other interesting Hamiltonian based low order algorithms can be found in [32], where a recursive O(n) algorithm for multi-rigid-body dynamics has been presented. Initially, the formulations are elaborated for serial chains and subsequently extended to deal with closed-loop topologies. Nevertheless, the approach is purely sequential since it is based on the idea of ABI. Surprisingly, the literature review reveals that it is difficult to find a fully parallel approach for multi-rigid body dynamics simulations based on the Hamiltonian approach.
In previous papers [33,34], the authors proposed a Hamiltonian based divide-andconquer algorithm (HDCA) for open-loop multi-rigid body dynamics. In this paper, the HDCA algorithm is significantly extended and generalized to deal with systems possessing closed-loop and coupled-loops topologies. The method solves Hamilton's canonical equations for constrained multi-rigid-body system in a divide-and-conquer manner. While the state of the mechanical system is described in terms of joint coordinates and joint momenta, the absolute coordinates are used as well to clearly present the underlying relationships between the variables. The HDCA recursive procedure for closed-loop multibody systems allows one to evaluate joint velocities and time derivatives of joint momenta with the aid of velocity level constraint equations in an efficient and parallelizable manner.
This paper is divided into four sections. Following the Introduction with brief literature review, Sect. 2 introduces basic concepts and definitions exploited in the paper and demonstrates novel Hamiltonian based parallel formulation for multi-rigid-body system dynamics possessing closed-loop chains. Section 3 describes and discusses the numerical results that were gathered to indicate the accuracy and efficiency of the proposed formulation. Original contributions of the paper are summarized in the last section.

Hamilton's canonical equations
This section provides a detailed formulation of the HDCA algorithm for the dynamics of complex multibody systems by utilizing the Hamilton's approach. Before we begin with the algorithm development, Hamilton's canonical equations for constrained multi-rigid body systems are recalled in a brief manner. At first, the classical equations of motion in dependent canonical coordinates are formulated. Subsequently, more elaborated form of Hamilton's equations is demonstrated through the exploitation of the augmented momenta.
The state of the multibody system having n degrees of freedom can be described by a set of first-order Hamilton's differential equations in the form [35,36] where q n , p n are system's canonical coordinates and momenta, Q n are external, applied and centrifugal forces, and are m algebraic constraint equations, which have to be fulfilled and which are associated with the unknown Lagrange multipliers λ m . The terms q and t represent the partial derivatives of the constraint equations with respect to the generalized coordinates and time, respectively. The Hamiltonian function is the Legendre transformation of the Lagrangian function, and it is defined as We consider mechanical systems subjected to holonomic constraints which are assumed to be independent. The motion of the system is analyzed with respect to the inertial reference frame. Since the total kinetic energy T is a quadratic function of the generalized velocities, it can be expressed with the use of canonical momenta where M is the mass matrix of the system. As presented in [30], it is possible to reformulate Hamilton's equations of motion to obtain a form which is more suitable for numerical implementations where σ m are the unknown constraint Lagrange multipliers impulses associated with the constraint equations at the velocity level, and P n are called augmented momenta vectors. One can combine (6) together with the velocity level constraint equations (3) to get a more concise matrix form Equations (7) and (8) describe the motion of a system through a set of 2n + m equations, which have to be solved and integrated at each time-step to determine the state of the system in the next time-instant. To be able to solve the differential-algebraic equations, one has to specify initial conditions for positions and momenta at joints, which have to be consistent with the imposed constraint equations. It is easier to specify initial conditions for position of a system. The initial conditions for momenta p n can be calculated using (6). In order to do that, one has to assume that at the initial time-step all constraint force impulses are known to be zero (σ m = 0), and no momentum is absorbed at constrained directions (see [30] for a more detailed discussion).

Joint velocities and constraint force impulses
Articulated momentum definition and the integral form of the equations of motion At an arbitrary point O, which is not coincident with a body's center of mass C (see Fig. 1b), the spatial momentum vector for a general body can be defined in a matrix form as P O ∈ R 6 [32] All expressions in (9) are derived with respect to the local coordinate frame with the origin at point O and then expressed in the global reference frame. The term M O ∈ R 6×6 denotes a mass matrix of the body for a given vector l OC between point O and point C, while V O is a spatial velocity vector composed of translational v O and rotational ω components. The term m is the mass of that body and J O is the moment of inertia with respect to the axis of local reference frame defined at point O and expressed in the global reference frame. The identity matrix is defined as I ∈ R 3 and the tilde symbol above a vector defines a skewsymmetric matrix associated with this vector. Let us consider a physical or compound rigid body A connected with the rest of the multibody system by arbitrary kinematic joints at the boundary points O 1 and O 2 as depicted in Fig. 1. From now on, let us consider such points as body's interfaces called handles, by means of which the body communicates with the rest of the system using both active and constraint force components. Particularly, the spatial articulated momentum vector P ∈ R 6 of linear and angular momenta related to a handle can be defined as where H ∈ R 6×n f represents the joint's motion subspace (n f -number of degrees of freedom for the joint) and D ∈ R 6×(6−n f ) is associated with the joint's constrained directions [24,26]. We assume that both matrices H and D consist of ones and zeros. The following identities hold in the formulation: H T H = I, D T D = I. The matrices H and D are orthogonal to each other, which yields the condition Equation (10) assumes that the articulated momentum vector taken up from the system can be decomposed into the parts related to the active component Hp and constraint component T = Dσ . Vector p ∈ R n f ×1 denotes the joint canonical momenta, whereas vector σ ∈ R 6−n f indicates constraint load impulses (Lagrange multipliers at the velocity level) at joints. The following condition holds for the Lagrange multipliersσ = λ [30]. For an exemplary body A (see Fig. 1) with two handles at points O 1 and O 2 , the integral form of the equations of motion can be formulated with the use of articulated momentum vectors as where the subscripts 1 or 2 indicate the handles O 1 and O 2 . The above equations make use of the shift matrices S A 12 and S A 21 , which are a handy tool for the transformation of sixdimensional momentum, force, and velocity vectors to other points of operation. If the position vector l 12 is measured from the point O 1 to O 2 , the shift matrix takes the form where 0 ∈ R 3×3 is the null matrix of appropriate size and l 12 is a skew-symmetric matrix. By performing minor modifications, (12)-(13) can be rearranged (by using (10)) into the more Fig. 2 The assembly of bodies A and B into body C favorable form Equations (15)-(16) provide a direct relationship between the spatial velocity vectors V A 1 , V A 2 of body A and constraint force impulses acting on the terminal bodies T A 1 , T A 2 . These equations form an integral form of the equations of motion for a general body A with two handles. The recursive assembly-disassembly procedure allows one to calculate all constraint force impulses and joint velocities in a divide-and-conquer manner. As a final remark, it is worth noticing that the matrix coefficients ξ A 11 , ξ A 12 , ξ A 21 , ξ A 22 for the velocity level equations have the same form as the coefficients formulated in the original Featherstone's paper (cp. [17]) for acceleration based equations. The coefficients include the system's mass and inertia parameters. On the other hand, the quantities ξ A 10 , ξ A 20 are associated with joint's canonical momenta and joint positions.

Assembly phase
This subsection formulates the integral form of the equations of motion for a compound body C. Let us define a set of bodies A and B interconnected by a kinematic joint as shown in Fig. 2. The integral form of the equations of motion for body A and body B can be written as The goal of the hierarchic assembly procedure is to construct the integral form of the equations of motion for body C in the analogous form as in the case of body A and B, i.e., where the handles O 1 and O 2 of the compound body C are coincident with handle O 1 of body A and handle O 2 of body B, respectively. The same form of the equations enables us to utilize the divide-and-conquer paradigm to assembly the whole system into one super-body, whose velocities depend on constraint force impulses at the terminal handles. In other words, the expressions ξ C 11 , ξ C 12 , . . . , ξ C 20 for body C have to be evaluated in terms of the known quantities for bodies A and B. The procedure requires an intermediate calculations of the interbody force impulses T B 1 and T A 2 using the expressions for force impulses T A 1 and T B 2 . The presence of the interconnecting kinematic joint imposes the velocity level conditions given in the form where H is the joint motion subspace that maps joint velocity vectorsq into the spatial velocity vectors. Let us substitute (18)- (19) into (23). By premultiplying the resulting conditions by D T , we obtain Newton's third law of motion has to be satisfied, thus the constraint force impulses have to satisfy the condition which enables us to use relation (25) in (24) to express the Lagrange multipliers σ as a function of constraint force impulses T A 1 and T B 2 at the terminal bodies The inverse matrices appearing in (26) exist since both matrices ξ B 11 and ξ A 22 are symmetric and positive definite as long as the constraints imposed on the system are independent. To simplify the formulas, the following matrices are defined: Hence, (25) becomes and enables us the calculation of constraint force impulses at the interconnecting joint using constraint force impulses at handle O 1 of body A and handle O 2 of body B. The final step involves the substitution of (29) into (17) and (20) As the handles of the compound body C are coincident with the leftmost handle of body A, and with the rightmost handle of body B, respectively, the following equations are fulfilled: The comparison of (30)-(31) with (21)- (22) leads to the following recursive expressions: Fig. 3 Possible assembly-disassembly process for an exemplary multibody system with closed-loops. The process allows one to assemble the multibody system starting from individual bodies denoted by ellipses to build compound (or virtual) bodies. At the end of the assembly process, the whole MBS is constructed, which enables one to exploit the boundary conditions for the base body connections. In the disassembly phase, the process is reversed in order to decompose the system into individual bodies Equations (34)-(36) constitute the main formulas for the divide-and-conquer assemblydisassembly procedure, which enables us to construct the integral form of the equations of motion for the articulated body C consisting of bodies A and B. At this point it is possible to perform the hierarchic assembly of the multibody system based on the binary tree decomposition (see Fig. 3). The efficient evaluation of the binary tree associated with the multibody system is a separate task and it will not be considered in this paper. The step finishes when the whole multibody system is assembled into one super-body. The assembly phase is summarized in Pseudo-algorithm 1.
Base body connection Let us define the integral form of the equations of motion for the whole system represented by one super-body as Constraint load impulses at point O 1 and point O 2 (provided that they exist) can be linked with the Lagrange multipliers σ 1 and σ 2 as On the other hand, the joint velocities can be expressed aṡ Algorithm 1 Pseudo-algorithm for the assembly phase Inputs: -Coefficients ξ of bodies A and B, -External load vectors Q of bodies A and B. Procedure: 1. Compute matrices W and β in (27) and (28) Outputs: -Coefficients ξ of body C, -External load vectors Q of body C.
At this point, three cases are possible: (a) The mechanism is free-floating. The boundary force impulses are equal to zero (T C 1 = 0, T C 2 = 0), and the expressions for the spatial velocities of terminal bodies simplify to The mechanism has an open-loop or tree-like topology. The handle O 2 refers to the tip of the chain and the constraint force impulse T C 2 is known to be zero. Thus, (37) can be reduced to the following form: Let us premultiply (42) by D T 1 . By using (39), one can resolve the relations for the unknown Lagrange multipliers σ 1 to get Since ξ C 11 remains symmetric and positive definite, there is no problem with finding the inverse. At this instant, the absolute velocity V C 1 is determinable by using (42).
Equations (44) enable us to calculate the unknown boundary Lagrange multipliers Usually, the inverse matrices in (45) exist. However, there are some circumstances, in which the calculation of constraint load impulses is not so straightforward. The prob-lems with matrix inversions may arise when the system passes near or through the singular configuration or/and the system is redundantly constrained. The redundancy results in problems of joint reactions solvability [37,38]. Nevertheless, let us assume that the boundary loads impulses have been successfully evaluated at this phase. With the boundary Lagrange multipliers σ 1 , σ 2 known, the velocities that connect the assembly with the fixed base-body can be computed as Analogously, the above scheme can be easily adapted to cope with more than two handles per body.
Disassembly phase As soon as the constraint force impulses T C 1 and T C 2 are available, the disassembly phase is started (cf. Fig. 3). The values of T C 1 and T C 2 can be transferred to the proceeding nodes of the binary tree to recursively compute all interconnecting constraint force impulses utilizing equation (29) and handles coincidence conditions (Eqs. (32)- (33)). When the interbody constraint force impulses are evaluated for all joints, the spatial velocities can be computed by using (15)- (16). Finally, it is possible to determine the joint velocities with the aid of projection onto the joints' motion subspaces (see Eq. (23)). The disassembly phase is summarized in Pseudo-algorithm 2. The computational steps described above are equivalent to the evaluation of the first set of Hamilton's canonical equations (1). The proposed method allows one to recursively calculate the derivatives of canonical coordinates without the need of determination of the system's Hamiltonian and its partial derivatives ∂H/∂p.

Derivatives of canonical momenta
This part of the paper is devoted to the calculation of time derivatives of canonical momenta. Initially, the equations of motion for MBS are formulated in terms of absolute momenta.
Subsequently, the notion of articulated and accumulated momenta are exploited to form the equations of motion in a divide-and-conquer manner. Finally, the expressions for the time derivatives of canonical momenta are demonstrated.

Equations of motion for unconstrained body
According to Newton's law of motion, the derivative of the absolute momenta vector of unconstrained body is equal to the sum of external load vectors. The following equations of motion, expressed with respect to the centroidal reference frame, are initially used: To construct the equations of motion with respect to the local coordinate frame attached at point O = C, one has to define the shift matrix S CO as The shift matrix S CO is defined analogously to (14). Let us note that the spatial momentum vectors transform in the same way as spatial force vectors. The following identities [10] are useful in further derivations: Equations (47) and (48) can be combined to obtain the equation of motion with respect to the local coordinate frame whose origin is attached at point Ȯ where Q O is a spatial external loads vector and P O is a spatial momentum vector related to the body at point O. A closer look at (51) reveals the fact that it is possible to conveniently simplify the expressionṠ CO P O and receivė whereṠ andṠ O consists of a skew-symmetric matrix of point O's translational velocity vector in inertial reference frame.  . 2). Let us assign again two handles per compound body by means of which the body interacts with other bodies in the system. The equations of motion for such system can be written in the forṁ

Equations of motion for constrained body
where index 1 refers to the leftmost handle, and index 2 refers to the rightmost handle of a body. The matricesṠ A 1 andṠ B 1 contain absolute, translational velocities of body A and body B expressed at handle 1 of the first and second body, respectively. The terms F 1 and F 2 denote reaction forces, which have to satisfy Newton's third law F B 1 = −F A 2 . Newton's third law of motion enables us to substitute F B 1 from (55) into (54) to construct the expression for the coupled, articulated bodẏ The equation can be reconstructed to a more suitable form using the simplifying transfor- Using the following substitutions and including the appropriate handles coincidence relations (57) takes a more concise form, similar to the relations in (54)-(55) where P C 1 and Q C 1 are accumulated momenta and accumulated external loads. At this point it would be possible to perform the whole multibody system hierarchic assembly according to the binary tree decomposition. The purpose is to define the equation of motion of the whole system in terms of the boundary reaction forces. This step finishes when the base-body is reached. Now, let us notice that the joint canonical momentum can be found as p 1 = H T 1 P C 1 [32]. Therefore, by exploiting (60), the time derivative of this expression yieldṡ The drawback of (61) lies in the presence of the unknown reaction force F C 2 . Secondly, although (60) and (61) look promising, it would be more beneficial to express the equations of motion in terms of articulated momenta which at this stage are already available.
The common procedure in the modeling of constrained multibody systems possessing closed-loops is to virtually cut redundant joints to generate a tree-like system. This is done by introducing the unknown constraint forces at cut locations. The idea formally leads to an extended Lagrangian and extended Hamiltonian functions for the considered multibody system, derivation of which is presented in [30] and briefly recalled in Sect. 1. Let us assume that the compound body C is virtually cut at handle O 2 , which means that The comparison of the accumulated momenta definition (58) with the integral form of the equations of motion (compare (12)-(13)) yields These expressions proclaim the fact that there is a clear dependency between accumulated and articulated momenta. Substituting (63) into (60) and exploiting (62) gives which simplifies to and allows eliminating reaction force F C 2 with the derivative of (62). Let us premultiply (66) by the boundary joint's motion subspace H T 1 to remove the dependence on the constraint force F C 1 = D 1σ 1 , yieldinġ Please note that the quantityṗ 1 is dependent on the known articulated momenta. Based on (62) and the assumption that there is a cut joint at handle O 2 , the value of the momentum's derivative at handle O 2 is equal to zero, i.e., which is derived with the use of the identity in the form H T 2Ḋ 2 = −Ḣ 2 D 2 . Any joint in the system that opens the loop(s) can be chosen as a cut joint. It is assumed that the time derivative of such joint's canonical momenta are equal to zero.
The above formalism can be extended to deal with any joint of constrained multibody system. The derivative of joint's canonical momenta k related to any body K can be expressed in terms of already available articulated momenta aṡ where subindex c designates a cut joint, P K 1 and P c are the articulated momenta for joint k and cut joint c, respectively. The derivatives of shift matricesṠ K 1 andṠ c contain the translational velocities of handle k and handle c computed with respect to the global, inertial reference frame. The application of this form of equations of motion enables us to include unknown reaction force at cut location c within the definition of articulated momentum vector. Equation (69) allows us to calculate the time derivatives of canonical momenta, by having a vector of accumulated external loads Q K 1 , which is only unknown in this equation.
Articulated external loads. The purpose of this section is to demonstrate how to calculate the articulated external loads by using a divide and conquer strategy. Let us define articulated external loads vectors Q at handle O 1 and handle O 2 for bodies A and B, respectively (cp. Fig. 2). Let us also construct the conditions describing the relation between external loads vectors as One can substitute (71) into (70) and utilize (72) to algorithmically assemble the forces for body A and B By applying the relations (58) and handles' coincidence conditions for compound body C (compare with (32)-(33)), one can obtain the following final form: Equation (74) presents the relation between the accumulated and articulated external loads vectors. Considering the whole multibody system assembled into one super-body C, the articulated loads vector Q C 2 is equal to zero as we assume that it concerns the cut joint (see (62)) of a closed-loop system. Thus, the base-body connection conditions can be taken into account to receive By knowing the values of the articulated loads vectors for specified handles, by means of which the system C connects to the base-body, it is possible to perform a back-propagation phase by utilizing the equation The above procedure allows one to efficiently compute all necessary loads vectors in parallel achieving logarithmic O(log 2 n) computational cost. If the external loads do not depend on the velocity terms, the method can be included in the phase of evaluation of joint velocities. Otherwise, it is unavoidable computing the loads separately, or expressing variables Q 1 and Q 2 for all bodies in terms of canonical momenta.

Derivatives of canonical momenta
By knowing all velocities, constraint force impulses, articulated momenta and articulated external loads, it is possible to compute the derivatives of canonical momenta for every joint k related to the body K using the following formula: Equation (77) is equivalent to the second set of Hamilton's canonical equations (2) for constrained multibody systems. There is no need to explicitly calculate the system's Hamiltonian and its partial derivatives ∂H/∂q.

Accuracy results
The non-stiff Dormand-Prince one step solver implemented in MATLAB (ode45) was used with absolute and relative error tolerances set to δ = 10 −6 . Figure 5 presents the total energy preservation error as a function of the simulation time. It can be concluded from the plots that the HDCA algorithm preserves the energy better for all cases within the range of assumed integrator tolerances. Figures 6 and 7 demonstrate the vertical component of the position and translational velocity of point K for the first test case (see Fig. 4). The algorithm is able to reproduce the point's trajectory correctly. The same conclusion may be drawn for other test cases. Figure 8 demonstrates a comparison of the constraint violation errors at the velocity level. The HDCA algorithm combines the equations of motion with the velocity level algebraic constraints. On the other hand, the DCA formulation makes use of algebraic constraints at the acceleration level. There is a huge discrepancy between the results in favor of the Hamilton's divide-and-conquer method. In the case of HDCA, the constraint equations at the velocity level are satisfied within the numerical accuracy. This behavior has an impact on the constraints violation errors at the position level. The position constraint errors for the DCA approach diverge quadratically in terms of the simulation time. On the other hand, the drift made by the HDCA is negligible as indicated in Fig. 9.

Efficiency results
Three MATLAB solvers are chosen to comparatively test the efficiency of the HDCA and the DCA algorithms, i.e., ode45, ode113, and ode23. The Dormand-Prince one step solver (ode45) is suggested as a first try for most problems. The ode113 method is a multistep and variable order Adams-Bashforth-Moulton PECE solver for non-stiff, but computationally intensive tasks. The ode23 is an implementation of an explicit Runge-Kutta (2, 3) pair of Bogacki and Shampine. It may be more accurate than ode45 in the presence of moderately stiff equations. The dynamics of the mechanisms (cases 1-3) presented in Fig. 4 are simulated for 10 seconds and various integration tolerances δ = {10 −3 , 10 −4 , . . . , 10 −9 }. It is assumed that the relative integrator tolerance is equal to the absolute tolerance. A series of numerical experiments have been performed to measure the relative performance of HDCA and DCA formulations. Three numerical test cases are taken into account as described in Sect. 3.1. The total wall-clock time of 10-second simulations and the number of right-hand side function evaluations taken by the integrators are recorded in order to build the comparisons. Figure 10 shows a ratio of the wall-clock time taken to execute DCA algorithm to the wallclock time taken to process HDCA for a given test case. These measures are presented as a function of the integrators' tolerances. On the other hand, Fig. 11 demonstrates the ratio of the right-hand side function evaluations associated with both algorithms in terms of the integrators' tolerances. There are several remarks that can be made based on the plots. The performance of HDCA is in most cases worse than DCA when ode45 integration routine is used for calculations. One case for which HDCA performs better than DCA can be observed for high integrator tolerances (10 −9 ). The situation changes when multi-step ode113 integrator routine is exploited for calculations.
This time HDCA has greater dominance over the acceleration based competitor measured either by using the wall-clock time or by counting the number of right-hand side function calls. It is worth noticing significant computational savings in terms of the wall-clock time for tight tolerances (i.e., 10 −9 ). When ode23 is exploited, one could take the performance advantages for the test case 1 and 2, while HDCA performs a little bit worse for the test case 3.  To complement the results, Table 1 demonstrates the mean wall-clock time taken for a single call of the right-hand side function. In our implementation, the HDCA approach requires less time to calculate the state variables for the test cases 1 and 2 than the analogous procedure for the DCA method. The mean time required to evaluate the right-hand side function is similar for both formulations when coupled-loop test case is considered. Table 2 presents the wall-clock time for the three test cases and three integrators used for the simulations. It turns out that ode23 is much slower than other two integration methods. The fastest method solving Hamilton's and Newton-Euler's equations of motion was ode113. Based on the results in Tables 1 and 2, one could estimate the total number of right-hand side function calls for each simulation.

Discussion
The HDCA algorithm presented in this paper is the approach that enables one to simulate complex multi-rigid-body systems subjected to independent, holonomic constraints in an efficient and robust manner. Hamilton's canonical equations are set and solved in a divideand-conquer framework which enables a full exploitation of parallel computing. Pseudoalgorithm 3 presents a list of detailed algorithmic steps that one has to perform in order to calculate the time derivatives of joint canonical coordinates. The numerical cost of sequential calculations remains linear in terms of the number of bodies n in the system. The method exhibits near logarithmic complexity for n threads available for computations. The parallelizable assembly-disassembly procedure is exploited at several different levels of the algorithm. Velocities, Lagrange multipliers, and time derivatives of canonical momenta are evaluated in a recursive, divide-and-conquer manner. There are two general strategies that can be utilized to parallelize the computations. The recursive HDCA method is connected to the binary tree associated with the topology of a system. Inherently, one can assume that each node of this tree establishes an independent instance of calculation. At each level of the binary tree, one could perform embarrassingly parallel computations. On the other hand, the whole tree could be divided into well-balanced subtrees to equally spread the computational load over each thread.  (74), (d) Set T C 2 = 0 and Q C 2 = 0. 5. Perform disassembly procedure (see Algorithm 2) for each of the assembled entities. 6. Calculate velocities V for each body in (15)-(16). 7. Compute all joint velocitiesq by projection of (23) onto joint's motion subspaces. 8. For each body determine P 1 (10) andṠ 1 in (53). 9. Identify cut joints and introduce constraint force impulses in (62). 10. Compute all derivatives of canonical momenta in (77).
There are several remarks that should be mentioned to properly analyze the numerical results gathered in Sect. 3. Firstly, the numerical examples examined in the paper are planar systems. Nevertheless, the HDCA algorithm can be successfully used to analyze the dynamics of spatial closed-loop systems. The method exploits spatial velocity as well as spatial momenta vectors. No simplifications have been made to limit the applicability of the formulation at any stage of the development. Moreover, the experience of the authors confirms that the HDCA method performs very well for fully three-dimensional multibody systems in terms of the total energy conservation and the constraint violation errors. Secondly, Sect. 3.2 shows that the HDCA algorithm is able to preserve the energy and constraint equations better than the DCA approach for considered closed-loop test cases. This property comes from the fact that HDCA takes the velocity level conditions as primary constraints. It is indicated that only the position constraints are violated, and the resulting errors are much smaller compared to the acceleration based DCA method. Thirdly, Sect. 3.3 demonstrates the performance of HDCA and the comparisons against DCA. The reader should be aware that the integrator absolute and relative tolerances are equal to each other in our implementations. The tolerances are equally assigned to the velocities vectorsq as well as to the momenta vectors p. To compare both formulations more accurately, one should go into the details of the integration routines. This issue is beyond the scope of this paper.
In some applications, there is a need to calculate constraint forces. This issue is particularly important when design or control applications are considered. The HDCA algorithm presented in this paper allows one to calculate the constraint force impulses. The method needs some extensions to be able to determine the constraint forces. One method to calculate the constraint forces is to exploit the conditionσ = λ. The finite difference method may be employed with its own advantages and disadvantages. The other way is to add a separate computational module for the calculation of reaction forces. It should be noted that the additional cost of such procedure is associated only with the evaluation of the matrices ξ 10 and ξ 20 . The other coefficients are already available since they are calculated by the HDCA algorithm. The evaluation of constraint forces requires only a small fraction of the total cost of the assembly-disassembly procedure for the evaluation of velocities and constraint force impulses.
As the constraint forces are considered, one should mention additional property of the algorithm. The constraint force impulse multipliers σ are the primary quantities evaluated by the HDCA method. In the case when a reaction force component λ i never changes its sign during the motion, the Lagrange multiplier σ i may eventually grow indefinitely with the simulation time (compare the conditionσ i = λ i ). This may lead to numerical errors. Presumably, this issue could be circumvented by scaling the subspace of constrained directions D i that in turn would affect the value of the Lagrange multipliers σ i . Obviously, such a scaling should not have any effect on the value of the constraint force impulses T i .
The HDCA algorithm is exploited for the dynamics simulation of systems possessing single kinematic loop and coupled kinematic loops. The idea demonstrated in the text can be generalized to cope with more complex topologies involving many coupled loops, especially when the number of handles per body is greater than two. The mentioned extensions require more detailed analysis of the topology of the system under consideration; however, the concept of the HDCA algorithm remains the same. Apart from the issues mentioned within this subsection, there are several other possibilities for the HDCA algorithm development. Undoubtedly, the adaptation of stabilization methods and symplectic integration procedures would further improve the HDCA algorithm's efficiency and accuracy.

Conclusions
In this paper, a novel HDCA algorithm is presented for the simulation of constrained multirigid body system dynamics possessing closed-loop topologies. The HDCA approach utilizes Hamilton's canonical equations of motion, which are recursively generated in a divideand-conquer framework. The two-or multi-handle equations are formed for the spatial velocities of associated bodies and constraint load impulses at joints. Also the evaluation of the time derivatives of joint canonical momenta can be performed in a fully parallel manner. The comparison of the results with the analogous, acceleration-based divide-and-conquer formulation indicates the predominance of the HDCA method for the considered numerical examples in terms of the total energy conservation and constraint violation errors. Sample closed-loop test cases demonstrate excellent constraint satisfaction at the position and velocity level as well as marginal energy drift for various integration procedures used for calculations. The properties of the method makes the HDCA scheme worth considering in the applications to systems with more general topologies than those presented in the paper.