Co-simulation with variable approximation order: order control algorithm for solver coupling approaches

Considering co-simulation and solver coupling approaches, the coupling variables have to be approximated within a macro-time step (communication-time step), e.g., by using extrapolation/interpolation polynomials. Usually, the approximation order is assumed to be fixed. The efficiency and accuracy of a co-simulation may, however, be increased by using a variable approximation order. Therefore, a technique to control the integration order is required. Here, an order control algorithm for co-simulation and solver coupling methods is presented. The order controller is incorporated into the control algorithm for the macro-step size so that co-simulations with variable integration order and variable macro-step size can be carried out. Different numerical examples are presented, which illustrate the applicability and benefit of the proposed order control strategy. This contribution mainly focuses on mechanical systems. The presented techniques may, however, also be applied to nonmechanical dynamical systems.


Introduction
Co-simulation terms a variety of numerical techniques, which can be applied to couple different solvers in time domain. A classical field of application for co-simulation methods concerns the simulation of multiphysical systems, see, e.g., [13,16,20,49,69]. In this case, each physical subdomain is modeled and simulated with its own (specialized) subsystem solver, whereas each subsystem may be considered as a black-box system. The subsystem solvers are numerically coupled by an appropriate co-simulation method. In literature, many interesting multidisciplinary problems have been solved incorporating co-simulation techniques, e.g., fluid/structure interaction problems [1,43], analysis of coupled finite-element/multibody models [2,3,48], multidisciplinary simulations in the framework of vehicle dynamics [11,36,45] or coupled problems including particle models [14,33,70,76].
Besides this classical area of application, co-simulation may also be applied advantageously to parallelize monodisciplinary dynamical systems, see, for instance, [4, 10, 31, 32, B. Schweizer 1 Department of Mechanical Engineering, Institute of Applied Dynamics, Technical University of Darmstadt, Otto-Berndt-Straße 2, 64287 Darmstadt, Germany 38,44,46,63,74]. Therefore, the global system is split into a certain number of subsystems, which are connected and simulated by a proper co-simulation approach. As a consequence, the overall computation time may be reduced massively.
A very straightforward consideration can be useful to demonstrate the possible computational benefit, which may be achieved by a co-simulation-based parallelization approach. We consider therefore a multibody system with n degrees of freedom, which is decomposed into r equally sized subsystems. The decomposed system is assumed to be integrated with an implicit predictor/corrector co-simulation method, see Sect. 3. Note that for implicit co-simulation approaches, an interface-Jacobian is required, which may, however, be calculated very easily in parallel with the predictor/corrector subsystem integrations so that the computation of the interface-Jacobian will not (markedly) increase the overall co-simulation time. Regarding an implicit co-simulation approach, the main computational effort is produced by the repetition of the macro-step for carrying out the corrector iteration over the macro-interval. Now, we assume that an O (n α )-algorithm is used for solving the equations of motion, where α is usually a value between 1 and 3. Consequently, the simulation time for the monolithic model is given by T Mono = constant · n α . Applying an implicit co-simulation scheme and assuming an average number of l corrector iterations within each macro-step, the simulation time of the co-simulation may roughly be estimated by T Cosim = (1 + l) · constant · n r α + Overhead (one predictor-step and l corrector-steps).
Practical simulations show (see [40]) that the number of corrector steps l is frequently rather small (l = 2 or l = 3). Hence, if the overall model is decomposed into a sufficiently large number of subsystems, the reduction of computation time may become very significant. It should be noted that the estimation formula for T Cosim will only yield reasonable results if the (average) integration step-size of the monolithic model equals the (average) integration step-size of the subsystems, see Sect. 6.6. The possible speed-up compared to a monolithic simulation may be improved even more, if the co-simulation is accomplished with a variable macro-step size (macro-step size controller). A further improvement may be achieved if a variable integration order is used (i.e., variable approximation order for the coupling variables). Applying an explicit co-simulation method, the possible speed-up may become even larger, since l = 0 for explicit co-simulations. It should be pointed out that for the case that r is chosen too large, the overhead will become the dominant part of the computation time so that the co-simulation will get inefficient [29,40].
Besides the directly obvious speed-up due to the decomposition into subsystems, which can be solved independently and in parallel within each macro-step, an improvement with respect to the simulation time may also be achieved due to the multi-rate effect. If the global model can be split into a small-sized stiff subsystem and a large-scaled non-stiff subsystem, only the smaller stiff subsystem has to be integrated with small subsystem integration-step sizes, while the larger non-stiff subsystem can be integrated very efficiently and stable with larger subsystem integration-step sizes.
Remark Parallelization of a dynamical system can be realized in different ways. Considering a monolithic dynamical model, which is discretized by an (implicit) time-integration scheme (e.g., BDF method, Runge-Kutta method), an algebraic system of equations has to be solved in each time step. Usually, parallelization is accomplished by solving the algebraic system of equations in parallel, i.e., the equation solver is parallelized. In contrast to this classical type of parallelization, a further parallelization can be achieved by making use of a co-simulation approach, as mentioned above. Hence, parallelization may be realized on two levels, namely i) by a decomposition of the overall model into subsystems, which are integrated in parallel within each macro-time step and ii) by applying the classical type of parallelization, i.e., by using parallelized subsystem solvers for solving the algebraic subsystem equations.
Here, we omit a detailed literature review on co-simulation and solver coupling techniques. The interested reader is referred to literature, see, e.g., [17,18,22,56].
Within a co-simulation approach, a macro-time grid (also called communication-time grid) has to be defined, i.e., macro-time points T i (i = 0, 1, 2, . . .) have to be specified, where the subsystems exchange the coupling variables. Between the macro-time points, the coupling variables are approximated, e.g., by means of extrapolation/interpolation polynomials of degree k. In the simplest case, an equidistant macro-time grid is used and the order of the approximation polynomials for the coupling variables is assumed to be constant. Using a co-simulation with constant order and constant macro-step size may, however, be rather inefficient. It has been shown in literature that the efficiency, accuracy, and stability of a co-simulation implementation may significantly be improved by incorporating a macro-step size controller, see [6,40,51,53,54].
In the current manuscript, the possible improvement of a co-simulation with variable integration order is investigated. For realizing a variable integration order, a strategy is required, which determines the optimal integration order. Therefore, the local truncation error within the macro-step from T N to T N+1 = T N + H N (H N denotes the current macro-step size) is estimated for different polynomial degrees k. The optimal polynomial order minimizes the estimated local error. Applying an appropriate order control algorithm, co-simulations with variable integration order can be realized. In this work, only co-simulation implementations with variable macro-step size are considered. It should therefore be stressed that the control algorithm for the integration order and the macro-step size control algorithm cannot be considered independently, since both influence each other. As a consequence, order and step-size have to be controlled simultaneously.
In the paper at hand, mechanical co-simulation models with an arbitrary number of subsystems are investigated. The developed order control algorithm may, however, also be used for nonmechanical systems. Considering mechanical systems, the coupling of the subsystems can be realized either by constitutive laws (coupling by applied-forces) [1,3,9,19,21,48,58,60] or by algebraic constraint equations (coupling by reaction forces) [5,23,30,39,55,57,59,61,62,72,73,75]. In this work, we restrict ourselves to the case of applied-force coupling approaches (explicit and implicit). Furthermore, we only consider parallel co-simulation implementations of Jacobi type. Sequential integration techniques (Gauss-Seidel type) [26,27] are not applied, since we focus on fully parallelizable models. To approximate the coupling variables between two macro-time points, Lagrange polynomials are used here. Other approximation approaches -acceleration based techniques [34,35], interface-model-based approaches [47], context-based approximation polynomials [7], methods applying a least square fit [42], etc. -are not considered in the current manuscript.
The new contributions of this manuscript can be summarized as follows: • An algorithm for controlling the order of the approximation polynomials for cosimulation and solver coupling methods is presented. • The developed order controller is imbedded into the macro-step size control algorithm.
• Efficiency, accuracy, and stability of a co-simulation implementation with variable integration order and variable macro-step size is analyzed with different numerical examples. • The large potential of parallelizing dynamical models with the help of co-simulation is demonstrated.
The manuscript is organized as follows: In Sect. 2, the basic idea of co-simulation and the relevant definitions are briefly summarized. The classical explicit and implicit co-simulation methods are shortly recapitulated in Sect. 3. The different error estimators, which are used in the current work for controlling the macro-step size, are collected in Sect. 4. The order control strategy and the combined algorithm for controlling the integration order and the macro-step size simultaneously are presented in Sect. 5. Numerical examples are given in Sect. 6. The paper is concluded in Sect. 7. Calculation of the interface-Jacobian, which is required for the implicit co-simulation approach, is treated in Appendix A. Some additional simulation results for Example 3 in Sect. 6.3 are collected in Appendix B.
2 Co-simulation model

General case: arbitrary mechanical system decomposed into subsystems
To demonstrate the basic strategy behind a co-simulation approach, a general mechanical system is considered, which is mathematically defined by the following first-order DAE system (differential-algebraic system of equations) The vector z contains the position variables q, the velocity variables v, and also the Lagrange multipliers λ; B represents a symmetric square matrix, which is not invertible in the general case, where Eq. (1) represents a DAE. For classical constrained multibody systems [12], for instance, the matrix B is given by where I denotes the identity matrix and M the mass matrix. In the ODE case, B is invertible. Next, the overall system is decomposed into r subsystems. To describe the coupling between the subsystems, appropriate coupling variables u have to be defined. Throughout this manuscript, we assume that the subsystems are connected by applied forces/torques (applied-force coupling) so that the coupling between the subsystems is described by constitutive laws. Constraint coupling, where the subsystems are connected by algebraic constraint equations and corresponding reaction forces (Lagrange multipliers) is not considered here. In the framework of a co-simulation approach, macro-time points T i (i = 0, 1, 2, . . . ) have to be defined. Macro-time points are discrete, not necessarily equidistant time points. With the macro-time points, the macro-step size H N = T N+1 − T N is defined. If a constant macro-step size H = T N+1 − T N = const. is used, H has to be specified by the user so that the co-simulation runs stable and provides accurate results. Applying an appropriate error estimator, the current macro-step size H N is determined by the macro-step size controller so that a user-defined error criterion is fulfilled. Between the macro-time points, the subsystems are integrated independently, since the subsystems only exchange information at the macro-time points. To carry out the subsystem integrations between the macro-time points, the coupling variables have to be approximated. In this work, Lagrange polynomials are used for approximating the coupling variables.
The states and multipliers of an arbitrary subsystem s ∈ {1, . . . , r} are arranged in the vector z s = (q s T , v s T , λ s T ) T . Note that the number of the subsystem is indicated by the superscript s. The equations of motion of subsystem s read The coupling variables u, which here represent input variables of the subsystems, can be defined implicitly by the coupling functions g c := u − ϕ (t, z, u) = 0. It should be mentioned that the subsystem coupling is, one the one hand, defined by the constitutive laws of the coupling forces/torques and, on the other hand, by the decomposition technique. Note that basically three different decomposition techniques may be distinguished, namely force/force-decomposition, force/displacement-decomposition, and displacement/displacement-decomposition, see [58,61,75] for more details. It should further be stressed that for applied-force coupling approaches, the coupling variables are only functions of q and v, i.e., the coupling functions may be expressed as g c := u−ϕ (t, q, v, u) = 0.
Making use of the coupling variables, the equations of motion of subsystem s can be rewritten as Collecting the right-hand sides of all subsystem equations into the vector f co (t, z, u) = (f co,1 t, z 1 , u T , . . . , f co,r (t, z r , u) T ) T , the decomposed co-simulation system can be formulated as with z = (z 1 T , . . . , z r T ) T . Note that in the framework of a co-simulation approach, the coupling conditions are only enforced at the macro-time points T N . Between the macro-time points, approximated coupling variables are used. Further details and examples concerning the decomposition of an overall system into subsystems can, for instance, be found in [29,40]. It should be emphasized again that co-simulation and the here investigated strategies for controlling the macro-step size and approximation order are not limited to mechanical systems.

Special case: system of coupled multibody models
In the subsequent sections, we focus on the special case that the subsystems are general constrained multibody systems. Then, the equations of motion (3) for an arbitrary subsystem s can be written asẋ The matrix H s defines the relationship between the (generalized) velocity variables v s and the (generalized) position variables x s ; M s denotes the symmetric mass matrix, and f s the vector of the applied and Coriolis forces. The algebraic constraint equations of the subsystem are collected in the vector g s (t, x s ) λ s with the constraint Jacobian G s = ∂g s ∂x and the vector of Lagrange multipliers λ s represents the constraint forces/torques generated by the subsystem constraint equations. The coupling forces/torques of subsystem s are arranged in the vector f c,s . A comparison with the general case of Sect. 2.1 shows that one gets for the special case of a system of coupled multibody models the relationships B s (t, z) = blockdiag(I , M s , 0) and f co,

Example: two multibody systems coupled by a flexible bushing element
As an example, two arbitrary unconstrained mechanical subsystems are considered. It is assumed that the subsystems are mechanically coupled by a linear viscoelastic bushing element (flexible atpoint joint) [58], see Fig. 1. More precisely, point C i of body i (center of mass S i , mass m i , inertia tensor J i ) belonging to subsystem 1 is connected to marker C j of body j (center of mass S j , mass m j , inertia tensor J j ) related to subsystem 2 by a linear three-dimensional spring/damper element.
The coupling points C i , C j are represented by the body fixed vectors r C i , r C j . Applying absolute coordinates, position and orientation of an arbitrary rigid body q is defined by the 3 coordinates of the center of mass r Sq and by 3 rotation parameters arranged in the vector γ q ∈ R 3 (e.g., 3 Bryant angles). The coordinates of the angular velocity ω q are associated with the time derivative of the rotation parameters by the general expressioṅ γ q = H γ q ω q with H ∈ R 3×3 . The equations of motion (Newton-Euler equations) for the coupling bodies i and j are given by

Coupling body i (subsystem 1):ṙ
In the above equations, f i and f j represent the resultant applied forces, which are acting on the two coupling bodies. The resultant applied torques are denoted by τ i and τ j .
The coupling force f c represents a spring/damper force, which is proportional to the relative displacement and to the relative velocity of the coupling points C i and C j ; f c can implicitly be described by the three coupling conditions where the diagonal matrix C = diag c x , c y , c z collects the three spring constants and the diagonal matrix D = diag d x , d y , d z the three damping coefficients. Note that the total time derivative with respect to the inertial system K 0 is denoted by d dt . Hence, the coupling forces and torques, which are acting on the two coupling bodies, read

Classical explicit and implicit co-simulation schemes
For approximating the coupling variables u within the macro-step T N → T N+1 , extrapolation/interpolation polynomials p N+1 (t) of degree k (local approximation order k + 1, i.e., O H k+1 ) are used. Since the subsystems only exchange information at the macrotime points, the coupling conditions are only enforced at the macro-time points T N , i.e., g c (t, z, u) = 0 is only fulfilled at the macro-time points. Between the macro-time points, the subsystems can be integrated independently. Applying a co-simulation approach, system (1) is replaced by the weakly coupled cosimulation system The new variables z N+1 are obtained by integrating Eq. (10) from T N to T N+1 with the initial conditions z N . It should be noted that for an explicit co-simulation method, the extrapolated coupling variables p N+1 (T N+1 ) will usually not satisfy the coupling equations at the new macro-time point T N+1 so that p N+1 (T N+1 ) − ϕ T N+1 , z N+1 , p N+1 (T N+1 ) = 0. Consistent coupling variables u N+1 can, however, simply be calculated by an update step. Therefore, the coupling equations u N+1 = ϕ (T N+1 , z N+1 , u N+1 ) have just to be solved for u N+1 . The update step will produce (small) jumps in the coupling variables, because p N+1 (T N+1 ) is usually not equivalent with the updated coupling variables u N+1 .

Explicit co-simulation scheme
The classical explicit co-simulation approach can be described very easily by considering the general macro-time step from T N → T N+1 [52,60,64,69]. We assume that at the beginning of the macro-time step, the variables of the subsystems (subsystem states and subsystem • In a second step, Eq. (10) is integrated from T N → T N+1 using the extrapolation polynomials p ext N+1 (t). This yields the variables z ext N+1 . In case of the explicit co-simulation scheme, the new variables at T N+1 are simply given by z N+1 := z ext N+1 . • In a final step, updated coupling variables u N+1 are computed. Therefore, the variables z N+1 are inserted into the coupling equations and Now, the next macro-time step T N+1 → T N+2 can be accomplished. The integration scheme is sketched in Fig. 2.
Note that for practical reasons, problems may occur in conjunction with feed-through systems. From the practical point of view, it may be complicated to perform the update step, e.g., if commercial simulation tools are used with reduced solver access. In this case, u N+1 = ϕ (T N+1 , z N+1 , u N+1 ) is just replaced by u N+1 = ϕ T N+1 , z N+1 , p N+1 (T N+1 ) . As a result, problems with the error estimator may arise (see [40] for more details). In the following, we therefore assume that updated coupling variables u N+1 are calculated.

Implicit co-simulation scheme
The classical implicit co-simulation scheme is based on a predictor/corrector approach, see, e.g., [58,60]. Again, the general macro-time step from T N to T N+1 is considered to explain the approach. The coupling variables u (t) are approximated by Lagrange polynomials of degree k. At the beginning of the macro-time step, the subsystem variables and the coupling variables are assumed to be known. . It should be mentioned that the predictor step is identical with the explicit cosimulation step of Sect. 3.1 (except for the update step).

Corrector iteration:
• Let u * N+1 denote arbitrary (initially unknown) coupling variables at the macro-time point T N+1 . Using u * N+1 , interpolation polynomials p * N+1 (t) for the coupling variables u(t) in the time interval [T N ,T N+1 ] can formally be generated with the help of the k + 1 sampling points T N+1 , u * N+1 , (T N , u N ) , . . . , (T N−k+1 , u N−k+1 ). Note that u N ,. . . ,u N−k+1 term the coupling variables at the previous macro-time points, which are assumed to be known.
By solving the linear system (11) for u N+1 , corrected coupling variables u cor,1 N+1 are obtained.
To compute corrected subsystem variables z cor,1 N+1 , interpolation polynomials p cor,1 N+1 (t) for the coupling variables u(t) are generated by using the k + 1 sampling points T N+1 , u cor,1 N+1 , (T N , u N ) , . . . , (T N−k+1 , u N−k+1 ). Now, the subsystem integration from T N → T N+1 is repeated with the polynomials p cor,1 N+1 (t), which yields the corrected subsystem variables z cor,1 N+1 . • Considering the αth corrector step, the (α−1)th corrector point is used as expansion point for the Taylor series expansion and the above procedure is repeated.
• The procedure is repeated l times, until a convergence criterion is fulfilled. The final variables u N+1 := u cor,l N+1 are the corrected variables. • By setting z N+1 := z cor,l N+1 , the next macro-step from T N+1 → T N+2 is accomplished.
The implicit co-simulation scheme is illustrated in Fig. 3. Some remarks on the different possibilities to calculate/approximate the interface-Jacobian dg c du *

N+1
can be found in Appendix A. In the following sections, basically two methods are used and compared, namely the calculation of the interface-Jacobian by a finite difference approach (Appendix A.1) and the approximate computation of the interface-Jacobian based on a low-order expansion technique (Appendix A.2).

Error estimators for the classical explicit and implicit co-simulation schemes
The efficiency and accuracy of a co-simulation may significantly be improved by using a macro-step size controller so that a nonequidistant communication-time grid can be realized. Therefore, error estimators are required, which can be used to estimate the numerical error produced by the co-simulation approach, i.e., the error introduced by the polynomial approximation of the coupling variables. In [40], three different error estimators for the classical explicit co-simulation approach (called ExLocal, ExMilne, ExSLocal) and two error estimators for the classical implicit co-simulation approach (called ImMilne, ImSLocal) have been derived. The basic idea for the construction of an error estimator, which estimates the local error generated in the macro-step T N → T N+1 , is to compare two different solutions at T N+1 : the state variables q N+1 , v N+1 (or, alternatively, the coupling variables u N+1 ) of the actual co-simulation generated with the classical explicit/implicit approach are compared with reference variablesq N+1 ,v N+1 (or alternatively u ext for the explicit and u N+1 − u pre N+1 for the implicit co-simulation) can be used to construct an error estimator. In this section, the main formulas for the five error estimators are shortly recapitulated. For a detailed description, we refer to [40]. It should be pointed out that the error estimators used here are only valid for continuous co-simulation systems. Considering discontinuous systems, systems with switches, etc., a modified error estimation strategy has to be applied.
In this work, the classical explicit and implicit co-simulation schemes are considered, see Sect. 3. Using extrapolation/interpolation polynomials of degree k (i.e., polynomial order k + 1), the coupling variables are approximated with O H k+1 . Since the coupling variables and therefore the accelerations are approximated with O H k+1 , the velocities v N+1 will converge with O H k+2 and the position variables q N+1 with order O H k+3 for both the explicit and implicit scheme. Hence, the following estimates hold: where q (T N+1 ) and v (T N+1 ) denote the analytical solutions with q (T N ) = q N and v (T N ) = v N . As can be seen, the integration order of the co-simulation may directly be controlled by controlling the polynomial degree of the approximation polynomials (see Sect. 5.2). Note that the error bounds of Eq. (12) are only valid for the case that the errors resulting from the numerical subsystem integrations can be neglected, i.e., strictly speaking for the case that the subsystems are integrated analytically.

Error estimator 1 for explicit co-simulation (ExLocal)
The first error estimator for the classical explicit co-simulation is based on a local extrapolation technique [65,71] and is denoted by ExLocal. In order to calculate an error estimator for q N+1 and v N+1 , comparative solutionsq N+1 andv N+1 are needed. If a local extrapolation technique is applied for deriving an error estimator, two solutions with different convergence orders are compared. For this purpose, two parallel co-simulations are carried out from T N to T N+1 , whereby the comparative solution is calculated with the same initial conditions (ẑ N = z N ): • The firstsimulation (actual co-simulation) is accomplished with the extrapolation polynomials p ext ) of degree k (approximation order k + 1). This integration yields the variables q N+1 and v N+1 .
• The second simulation (comparative solution) is carried out with the extrapolation polynomialsp ext The two co-simulations are illustrated in Fig. 4. Within the local extrapolation technique, the local error of the co-simulation in the macro-step from T N → T N+1 can very easily be estimated by the difference of the two solutions. Considering the first simulation, the local errors of the position variable q i and the velocity variable v i are defined by Substituting the analytical solutions q i (T N+1 ) and v i ( Therefore, it is possible to use the expressions

Error estimator 2 for explicit co-simulation (ExMilne)
The second error estimator for the classical explicit co-simulation approach is indicated by ExMilne and based on a Milne device approach [41]. Within this approach, two different solutions with the same convergence order but different leading error terms are compared. Again, two parallel co-simulations from T N to T N+1 are executed with the same initial conditions (ẑ N = z N ): • The first simulation (actual co-simulation) is carried out with the extrapolation polynomials p ext The two co-simulations are illustrated in Fig. 5 for the case k = 1. Note that q N+1 andq N+1 are calculated with polynomials of degree k and locally convergence with O H k+3 . In [40] (Appendix A.2), it is shown that the local errors of q N+1 andq N+1 may be written as where H N = T N+1 − T N represents the current macro-step size. The error constants C k pos,N+1 and C k+1 pos,N+1 can be expressed as special integrals of the Lagrange basis polynomials L k (t), namely The matrix A co N represents a special Jacobian matrix, which does not explicitly depend on the macro-step size H N . The vector p N+1 is also not explicitly depending on H N and may be calculated by the difference of two polynomials. A direct calculation of the leading error term H 2 N C k+1 pos,N+1 A co N p N+1 in Eq. (16) is not possible. The leading error term may, however, be determined in a straightforward manner with the help of the two solutions q N+1 andq N+1 . Subtracting Eq. (16a) from Eq. (16b) and multiplying by C k+1 pos,N+1 /C k pos,N+1 gives which converges to the local error e pos i,N+1 with order O H k+4 . Also, an error estimator ε vel i,N+1 for the velocity variables can be derived, namely which approximates the local error e vel

Error estimator 3 for the explicit co-simulation (ExSLocal)
The third error estimator for the classical explicit co-simulation approach is also constructed with the help of the local extrapolation technique and termed by ExSLocal. With the error estimators of Sects. 4.1 and 4.2, the local errors e of the state variables q and v can be estimated. In an alternative approach, the quality of the co-simulation is not directly measured by considering the local error of the state variables, but by regarding the local error of the coupling variables u. The coupling variables are generally defined by the implicit coupling conditions u − ϕ (t, q, v, u) = 0 and may therefore simply be considered as a nonlinear function of q and v. Instead of using e pos i,N+1 and e vel i,N+1 for measuring the quality of the co-simulation, the local error u i,N+1 − u i (T N+1 ) of the coupling variables may be considered.
Applying the classical explicit co-simulation scheme, the extrapolated coupling variables p ext ) are used for integrating the subsystems. After the subsystem integration from T N to T N+1 has been accomplished, updated coupling variables u N+1 are computed at the new time point T N+1 . Therefore, the new state variables q N+1 , v N+1 are inserted into the coupling equations and the relationship The predicted coupling variables u ext may therefore be used to construct an error estimator based on the local extrapolation technique. It has been shown in [40] that the local error e u i,N+1 = u ext i,N+1 − u i (T N+1 ) of the predicted coupling variable u ext i,N+1 can be estimated with the help of the updated coupling variable u i,N+1 according to The estimator ε u i,N+1 converges with order O H k+2 to the local error e u i,N+1 if consistent coupling variables u i,N+1 are used. Hence, this estimator can only be applied, if updated coupling variables are calculated at T N+1 by solving It should be mentioned that for systems without feed-through, consistent coupling variables can be calculated explicitly in a very straightforward manner with the help of the new variables q N+1 and v N+1 by simply evaluating In this case, ε u i,N+1 will in general not converge with O H k+2 to the local error e u i,N+1 . As a consequence, Eq. (21) will not represent a mathematically sound error estimator for the predicted coupling variables.

Error estimator 4 for the implicit co-simulation (ImMilne)
The implicit co-simulation scheme according to Sect. 3.2 is based on a predictor/corrector approach. With the help of the Milne device technique, the difference between the predicted and corrected state variables can be used to construct an error estimator [40]. The local error of the corrected position variables q i,N+1 can be estimated by where C k pos,N+1 are the error constants defined in Eq. (17). The estimator ε pos i,N+1 converges to the local error e A similar approach for the velocity variables yields the error estimator where C k vel,N+1 denote the error constants of Eq. (20). The estimator ε vel i,N+1 approximates the local error e vel

Error estimator 5 for the implicit co-simulation (ImSLocal)
This error estimator, which is based on a local extrapolation approach, can be used to monitor the error of the predicted coupling variables u pre N+1 , i.e., to estimate the local er- . As shown in [40], the difference between the predicted and corrected coupling variables, i.e., converges with order O H k+2 to the local error e u i,N+1 .

Macro-step size and order control algorithm
The here proposed algorithm for controlling the macro-step size and the integration order (i.e., the polynomial degree k of the approximation polynomials for the coupling variables) is carried out in three steps. The suggested approach may be interpreted as a straightforward generalization of step size and order control strategies well-established in the field of classical time integration [8,15,67,68]. In the first step, it is checked, whether the macro-step can be accepted or has to be repeated since the estimated error is too large (Sect. 5.1). For that purpose, the error estimators for the local error of Sect. 4 are used. The second step contains the algorithm for the selection of the optimal integration order (Sect. 5.2). In the third step, the optimal step size for the next macro-step is determined (Sect. 5.3).

Step 1: Local error test
After the macro-step T N → T N+1 with T N+1 = T N + H N has been accomplished, the local error is estimated with the help of one of the entire five methods suggested in Sect. 4. The estimated error ε N+1 -more precisely the weighted root mean square norm of ε N+1 -has to pass the subsequent local error test Note that n * designates the number of variables, which are used for controlling the macrostep size. Basically, three alternatives exist. First, all state variables of all subsystems may be considered in the sum of Eq. (25). Second, only the state variables of the coupling bodies are taken into account in the summation. Third, in connection with the error estimators of Sect. 4.2 and Sect. 4.5, the summation is carried out over the n c coupling variables (i.e., n * = n c ). The weights W i,N+1 depend on the user-defined relative and absolute error tolerances rtol and atol i and also on the magnitude of the variables, which are used for the error estimation. Hence, essentially two cases have to be distinguished: i) The error of the state variables is estimated (ExLocal, ExMilne, ImMilne); ii) The error of the coupling variables is estimated (ExSLocal, ImSLocal).
In case i), the error analysis is carried out separately on position level and on velocity level. This is reasonable, since the position and velocity variables exhibit a different convergence behavior, see Sect. 4. Hence, in case i) the local error test with the weights W pos i,N+1 = atol is applied. Principally, all subsystem state variables can be used for the error estimation, i.e., the index i in Eq. (25) runs over all bodies. Alternatively, only the states of the coupling bodies are considered for the error estimation so that the index i only runs over the coupling bodies. It should be mentioned that the value atol i can be chosen individually for each component, while rtol is identical for all components. It is clear that the proper choice of atol i and rtol is problem-dependent and strongly affects the accuracy and the efficiency of the co-simulation.
In case ii), the local error test reads with the weights W u i,N+1 = atol u i + rtol · u i,N+1 . The index i in Eq. (25) runs over all coupling variables. Now, two possibilities have to be considered: a) The local error test is passed, i.e., condition (26) or condition (27) is fulfilled. Then, the next macro-step T N+1 → T N+2 with T N+2 = T N+1 + H N+1 can be carried out. b) The local error test is not passed. In this case, a repetition of the current macro-step with a reduced step sizeH N has to be accomplished.
For a concise notation, both possibilities are combined. Therefore, the macro-step carried out next is denoted by In short, the index M is set to M := N + 1 if the local error test is passed. If the local error test has failed, the index M is set to M := N .

Step 2: optimal integration order and order control algorithm
The task consists of the selection of the optimal polynomial degree k for the approximation polynomials for the next macro-step T M → T M+1 . Note again that the next macro-step is either a new macro-step (case a)) or a repetition of the current macro-step (case b)).

Order control algorithms for BDF solvers a) Basic idea:
The order control algorithm applied here for calculating the optimal polynomial degree k for the approximation polynomials is inspired by well-established order control approaches used in connection with BDF solvers. Regarding variable-order BDF solvers, the optimal integration order is determined by considering the leading error term in the remainder of a Taylor series expansion. The remainder of the Taylor series is used for estimating the local truncation error (LTE). The local truncation error of an arbitrary time integration method of order k depends on the (k + 1)th derivative of the solution vector z and on the solver step size h [15,67] and reads Note that c k+1 terms the leading error constant, which generally depends on the numerical integration method. Considering concretely the time step from t N to t N+1 = t N + h carried out with a BDF integrator of order k, the local truncation error reads LT E is unknown at the beginning of the time step, the term z (k+1) (t i+1 ) is replaced by the (k + 1) st derivative of a predictor polynomial P pre The integration order for the next solver step is determined with the aim to maximize the integration step size. Therefore, the local truncation error norms , and LT E (k + 1) ≈ c k+2 h k+2 z (k+2) are considered. The integration order, which provides the lowest estimated truncation error and consequently allows the largest step size, is used for the next step [50].
If higher order integration methods are used (k > 2), the order selection algorithm is usually extended by a heuristic approach to detect numerical instabilities. The heuristic approach suggested in [68] is based on the idea that oscillations with high frequencies and growing amplitudes will arise, if the simulation gets unstable. As a consequence, the local truncation error will oscillate rapidly with increasing magnitudes, when solver instabilities occur.

b) Modified approach:
A slightly modified approach for controlling the integration order, which is, for instance, implemented in the IDA solver of the SUNDIALS package [25], has been suggested in [8]. Note that IDA is a variable-order, variable-coefficient BDF solver for the solution of implicit differential-algebraic equation systems. The modified approach also considers the leading error term of the remainder of a Taylor series expansion. However, the leading error term is scaled to be independent of the error constant. Within the control algorithm proposed in [8], the three terms h k−1 z (k−1) , h k z (k) , and h k+1 z (k+1) are compared. If these three terms do not form a decreasing sequence, i.e., if the condition h k−1 z (k−1) > h k z (k) > h k+1 z (k+1) is not fulfilled, the integration order is decreased to k − 1. If the condition is fulfilled and if also h k+1 z (k+1) > h k+2 z (k+2) is satisfied, the order may be increased to k + 1. Numerical studies have shown that applying the modified approach, the integration order is reduced sooner in case of numerical instabilities compared to the approach described in a).

Order control algorithm for co-simulation
The order control algorithm implemented here for controlling the integration order of the cosimulation may be regarded as a straightforward adaption of the order control strategy suggested in [8] for BDF solvers. Within each macro-step, extrapolation polynomials P is fulfilled, the co-simulation is continued with the current polynomial degree k. If the condition is not fulfilled, the polynomial degree is reduced to k − 1. If the condition (30) is is computed. The polynomial degree will be in- is fulfilled. The complete order control algorithm implemented here is summarized in the subsequent scheme and in the flowchart of Fig. 6 and is mainly based on the order controller of the IDA solver [25]: • Check, if polynomial order has to be decreased: -  (SDN (k + 1) , SDN (k + 2)): * k new is not adjusted

Macro-step size controller
After the polynomial degree k := k new for the next macro-step has been determined, the algorithm for the macro-step size selection is executed. The macro-step size controller computes a positive, real-valued scaling factor r ρ . This factor is used for calculating the new macrostep size H M for the next macro-step T M → T M+1 (see Eq. 1) The two consecutive macro-steps T N → T N+1 and T M → T M+1 are carried out with the same polynomial degree. In this case, the error estimator ε N+1 according to Sect. 4, which has been used for the local error test in Eq. (25), is also applied for calculating the new macro-step size H M . 2) After the macro-step T N → T N+1 has been accomplished, the polynomial degree is changed by the order control algorithm. In this second case, the error estimator ε N+1 cannot be used to calculate the step size for the next macro-step. Hence, the alternative a priori error estimator is used for the next macro-step T M → T M+1 . The two vectors P k+1 . It should be mentioned thatε u M+1 simply estimates the local error of the predicted coupling variables.

Remark 1
The a priori error estimatorε u M+1 is only used to determine the new macro-step size H M . After the macro-step T M → T M+1 has been accomplished with H M , the error is again checked with the a posteriori error estimator ε M+1 according to Sect. 4. Hence, acceptance of a macro-step, see Eq. (25), is always based on the a posteriori error estimator.

Remark 2
The a priori estimator only monitors the error of the coupling variables. Using the error estimators ExLocal, ExMilne, ImMilne, however, the error of the state variables is monitored. As a consequence, the a priori estimator and the a posteriori estimator may be based on different variables.
Considering case 1), the scaling factor r ρ for the new (optimal) macro-step size H M is calculated by one of the following two relations: Relation (α) is used for the error estimators ExLocal, ExMilne and ImMilne, which are based on state variables; (β) is applied in connection with the error estimators ExSLocal and ImSLocal, which are based on coupling variables. Regarding case 2), the scaling factor r ρ for the new macro-step size H M is computed bỹ The exponents in Eqs. (32) and (33) refer to the convergence order of the co-simulation method, which is k + 3 on position level and k + 2 on velocity level for the here considered explicit and implicit co-simulation approaches. The coupling variables converge with order k + 1. A detailed derivation of Eqs. (32) and (33) can be found in [40]; SF is a user-defined safety factor, e.g., SF ∈ [2, 6]. The safety factor has to be chosen carefully. If the safety factor SF is chosen too small, the number of macro-step repetitions may increase significantly due to the increased number of failed local error tests. Choosing SF too large may entail too small macro-step sizes. If both the error of the position variables and the error of the velocity variables are estimated (ExLocal, ExMilne, ImMilne), the scaling factor is here chosen conservatively by means of r ρ = min r pos , r vel . Summarizing, the scaling factor r ρ is determined according to simulation is aborted and an error message is printed.
It should be noted that the error tolerances atol i and rtol for the co-simulation as well as the safety factor SF have to be specified by the user. Generally, they are problem-dependent.
The macro-step size control algorithm described above may be interpreted as an Icontroller. Alternatively, a PI-controller may be used by incorporating the previous errors ε pos i,N and ε vel i,N [24].

Numerical examples
The co-simulation models considered here consist of the subsystems and the co-simulation interface. The communication between the co-simulation interface and the subsystems is A flowchart of the parallel implementation of the explicit co-simulation method with the macro-step size and order control algorithm is shown in Fig. 8. A corresponding flowchart of the implicit co-simulation approach is depicted in Fig. 9.
It should be mentioned that the co-simulations have been executed fully parallelized on a high-performance supercomputer [37] with Intel Xeon Platinum 9242 processors. Each subsystem integration has been accomplished on an individual core; the different cores are connected over an InfiniBand interconnect. Note that the computation times shown in the subsequent figures reflect the wall-clock time (including communication latency).

Example 1: Nonlinear two-mass oscillator
As a first example, we consider the nonlinear two-mass oscillator shown in Fig. 10 (masses m 1 , m 2 ; spring constants c 1 , c 2 ; damping coefficients d 1 , d 2 ). The two masses are coupled Fig. 9 Flowchart of the implicit co-simulation approach with the macro-step size controller ImMilne (interface Jacobian computed by a finite difference approach requiring additional subsystem integrations with perturbed approximation polynomials) Furthermore, a penalty-based contact force F C = −10 −2 N · e (10 7 m −1 ·x 1 ) is assumed to act at m 1 . Applying a force/force-decomposition approach, the coupled co-simulation system is defined by the following semi-explicit index-1 DAE system Coupling condition: The simulations have been carried out with the subsequent parameters:  Figure 11 depicts the displacement x 1 (t) and the corresponding velocity v 1 (t) for the different error estimators of Sect. 4 in combination with the order control algorithm of Sect. 5. The reference solution has been generated with a monolithic model, which has been integrated with very small error tolerances. The macro-step sizes for the different error estimators are shown in Fig. 12; the degree k of the approximation polynomials is depicted in where z ref ∈ R N×Nz denotes the reference solution; N z terms the number of state variables, and N is the number of output points.

Example 2: Comparative study between explicit and implicit co-simulation approaches
A chain of n K masses is considered, see Fig. 15. The masses, described by the displacement coordinates x i and the velocities v i =ẋ i , are connected by nonlinear spring/damper elements. The spring/damper element depicted in Fig. 16, which connects the two masses m i−1 and m i , is defined by the constitutive equation where c i , d i , C i , D i characterize the linear and nonlinear stiffness and damping parameters. The chain model is used here because it is simple to scale with respect to the number of degrees of freedom. Also, the number of subsystems can be changed very easily. Therefore, the model is well suited to carry out reliable numerical studies in order to compare efficiency and accuracy of different co-simulation methods. The model is separated into r subsystems via force/force-decompositions. The n c = r − 1 coupling conditions are defined by the constitutive equations of the coupling spring/damper Fig. 16 Coupling force λ s between the last body of subsystem s and the first body of subsystem s + 1 elements. Regarding the two adjacent subsystems s and s + 1, the coupling condition g s is given by At special time points t α (t α = 0, 0.05 s, 0.1 s, . . . ), impulse-shaped external forces . The subsystem integrations have been accomplished with a variable step-size BDF integrator [25] with the solver settings rtol BDF = 1.0E − 6, atol BDF x = 1.0E − 9, and atol BDF v = 1.0E − 6. The co-simulations have been carried out with the macro-step size and order control algorithm described in Sect. 5. Simulation results are presented for the two error estimators ExSLocal and ImSLocal, see Sect. 4, with the subsequent macro-step size controller parameters: SF = 6.0, atol λc = 10 −1 , rtol λc = 10 −3 , r ρ min = 0.5, and r ρ max = 2.0.

Case study 1
The main intention of Example 2 is to compare the numerical performance of the explicit with the implicit co-simulation approach. Giving general statements is, of course, not possible since the performance of a co-simulation approach strongly depends on the system and the model parameters. For the considered model, our simulations have, however, shown that the damping coefficients are crucial and important parameters for comparing the efficiency of explicit and implicit methods. Within a first numerical study, the damping parameter d i  Figure 17 shows displacement and velocity of mass 6891 for d i = 1.0E4 Ns/m simulated with the explicit co-simulation approach (variable co-simulation order; variable macro-step size with error estimator ExSLocal). Although the system is rather stiff and although steep gradients occur due to the impulse-shaped excitation, the co-simulation is stable and the results are accurate and close to the reference solution. Figure 18 collects simulation results accomplished with the implicit co-simulation method (variable co-simulation order; variable macro-step size with error estimator ImSLocal). The same model parameters have been used as for the explicit co-simulations depicted in Fig. 17; only the damping parameter has been increased to d i = 5.0E5 Ns/m. Again, a good agreement between the co-simulation and the reference solution can be observed. Figure 19 shows a direct comparison between the explicit (ExSLocal) and the implicit (ImSLocal) co-simulation approach for the case d i = 1.0E4 Ns/m. The plots illustrate the macro-step size as well as the polynomial degree of the approximation polynomials over the simulation time t . As can be seen, the (average) macro-step size is (slightly) smaller for  The same study for the case d i = 5.0E5 Ns/m is collected in Fig. 20. Here, the (average) macro-step size is markedly smaller for the explicit approach indicating that the implicit cosimulation approach is more efficient for the higher-damped case with d i = 5.0E5 Ns/m. Again, the order control algorithm works well for both co-simulation methods. Figure 21 exhibits the computation time, the number of macro-steps and the global error of the explicit and implicit co-simulation approach as a function of the damping coefficient d i . It can clearly be observed that the implicit co-simulation approach becomes more efficient for increasing values of d i . The explicit co-simulation is faster for d i ≤ 1.0E5 Ns/m ( Fig. 21(a)), although smaller macro-step sizes are needed in connection with the explicit approach ( Fig. 21(b)). Note that the implicit approach requires a computationally expensive corrector iteration. As a consequence, the implicit approach will only be more efficient than the explicit approach, if the macro-step sizes for implicit method are significantly larger than for the explicit method. The resulting global error of the state variables of the coupling bod-  Fig. 21(c). As can be detected, the error of the implicit approach is usually little smaller, although both methods have been simulated with the same error tolerances for the macro-step size controller.

Case study 2
In a second numerical study, the performance of the explicit approach is compared with the performance of the implicit co-simulation method for different values of the stiffness coefficient c i , see Fig. 22. The simulations have been carried out with the same parameters that have been used in the first case study. The damping coefficient has been set to d i = 1.0E4 Ns/m. As can be seen, the explicit approach is always faster than the implicit method.

Case study 3
Within the third study, the explicit and implicit co-simulation methods are compared for different values of m i , see Fig. 23. The same parameters have been used as in the first case study. The damping and stiffness coefficients have been set to d i = 1.0E4 Ns/m and c i = 1.0E7 N/m. The plots clearly indicate that the implicit approach is more efficient than the explicit method for small values of m i (m i ≤ 8.0E − 3 kg).

Example 3: comparative study between co-simulations with constant and variable integration order
Next, the performance of co-simulations including the order controller (k = var.) is compared with the performance of co-simulations using a fixed polynomial degree for the approximation polynomials (k = const.). The chain model of Example 2 is considered again and basically the same parameters are used. In Sect. 6.3, all explicit co-simulations are carried out with d i = 1.0E4 Ns m and all implicit co-simulations with d i = 5.0E5 Ns m . The following macro-step size controller parameters have been used: SF = 6, atol x = 10 −3 · rtol, atol v = rtol, atol λc = 10 2 · rtol, r ρ min = 0.5, r ρ max = 2.0. In the subsequent simulations, rtol is varied in order to show the influence of rtol on the global error and on the number of macro-steps.
Simulation results with the five different error estimators of Sect. 4 are depicted in Figs. 24-28. The plots show the total number of macro-steps and the resulting global error of the state variables of the coupling bodies. The following observations can be made: • The order controller reduces the number of macro-steps for all considered co-simulation approaches and for all chosen error tolerances. An exception is only observed for the  , it can be seen that for k ≥ 2 the number of macro-steps is almost constant and not influenced by the error tolerance. Usually, one would expect that the number of macro-steps will decrease with increasing rtol. This unexpected behavior might be traced back to a stability problem [60]. Most likely, the cosimulations for k ≥ 2 will get unstable for larger macro-step sizes. Therefore, the macrostep size controller reduces the macro-step size, until the system reaches a numerically stable region. • Regarding the plots a) of Fig. 27 and Fig. 28, it can be observed that for k ≥ 3 the number of macro-steps will not be reduced if the error tolerance rtol is increased. This unexpected behavior might also be explained by the fact that the stability limit of the co-simulation is reached for k ≥ 3.  i) Subsystem integration time (the slowest subsystem integration process is relevant); ii) Additional subsystem computations resulting from the co-simulation (save/reload of subsystem workspace in connection with macro-step repetitions); iii) Special computations of the co-simulation interface, specific for the different cosimulation methods (e.g., macro-step size controller, computation of corrected coupling variables); iv) Synchronization and data exchange between the subsystems and the co-simulation interface (MPI data traffic).
The four parts of the overall computation time are represented with different colors in Figs. 29-33.

Example 4: comparative study between finite difference gradients and low-order sensitivities
Next, the system size is increased by adding further masses to the chain. The model is decomposed into r = 2399 subsystems and simulated by the fully controlled, i.e., order and macro-step size controlled, ExSLocal and ImSLocal co-simulation approaches. The subsequent macro-step size controller parameters have been used: SF = 6, atol λc = 10 −1 , rtol λc = 10 −3 , r ρ min = 0.5, r ρ max = 2. Due to the increased number of subsystems, the number of coupling variables will also increase.
Basically, the interface-Jacobian required for implicit co-simulation approaches, see Eq. (11), can always be calculated numerically using a finite difference approach (FD), see Appendix A.1. The additional subsystem integrations with perturbed coupling variables, which are required for the finite difference approach, can be executed in parallel with the main subsystem integration (predictor/corrector integration). Due to the parallel integration, the overall computation time of the co-simulation will therefore not be (markedly) in- 34 ExSLocal approach (for d i = 1.0E4 Ns/m) and ImSLocal approach (for d i = 5.0E5 Ns/m) in connection with low-order sensitivities: number of macro-steps and computation time for different model sizes creased. However, additional cores are required for carrying out the subsystem integrations with the perturbed approximation polynomials. The larger the number of coupling variables becomes, the larger the number of additional cores will be. Hence, for models with a large number of coupling variables, the FD approach for calculation the interface-Jacobian will get problematic or even impossible. In this case, the gradients can very easily be approximated by making use of a low-order approximation approach (low-order sensitivities, LOS), see [29]. For the considered model, for instance, 3 · r − 2 + 1 = 7196 cores are required in total if an FD approach is used to calculate the interface-Jacobian. With the low-order approximation approach, only r + 1 = 2400 cores are needed, which equals the number of required cores for a fully parallelized explicit co-simulation approach (ExSLocal). A description of the FD approach and a short summary of the LOS approach are collected in Appendix A.
In Example 4, the interface-Jacobian required for the implicit co-simulation approach (ImSLocal) is therefore calculated with the LOS approach. It should be mentioned that the interface-Jacobian for the examples in Sects. 6.1-6.3 have been calculated with the FD approach. For the example in Sect. 6.3, simulation results generated with the LOS technique are arranged in Appendix B so that implementations with FD and LOS may directly be compared.
Co-simulation results using the chain model of Sect. 6.2 with n K = {5E4,1E5,5E5,1E6} masses are depicted in Fig. 34. The simulations have shown that the average macro-step size is slightly increased compared to the case n K = 1E4 of Sect. 6.2. As a consequence, the computational overhead caused by the co-simulation interface (green) and also the overhead caused by the data traffic between the co-simulation interface and the subsystems (yellow) is slightly decreased compared to the case n K = 1E4 in Sect. 6.2. The solver time per subsystem (blue) is increased, because the number of masses per subsystem is increased. Regarding the efficiency of the implementation, it can clearly be seen from Fig. 34 that a decomposition of the model into 2399 subsystems is inefficient for a model size with n K = 5E4 and n K = 1E5 masses, since the overhead is rather large compared to the overall computation time. If the number of masses increases, the overhead becomes lower compared to the subsystem integration time. This clearly indicates that a parallelization of a dynamical model by means of a co-simulation technique can very significantly reduce the overall simulation time. Increasing the number of masses even more (n K > 1E6), the improvement would even be more apparent.

Example 5: Study on the safety factor
The safety factor SF, see Eq. (32), is a parameter to tune the ratio between the total number of macro-steps and the number of failed macro-steps (failed local error tests). Again, the chain model of Sect. 6.2 is considered with the following parameters: n K = 10 5 masses, r = 100 subsystems, The model is computed by the explicit and the implicit co-simulation approach with quadratic approximation polynomials (k = 2). The macro-step size is controlled by the error estimators ExMilne and ImMilne. The limits for the scaling factor r ρ between two consecutive macro-steps are chosen by r ρ min = 0.5, r ρ max = 1.5. The error tolerances for the macrostep size controller are set to atol x = 10 −8 , atol v = 10 −4 , and rtol = 10 −4 . The resulting computation time as a function of the safety factor SF is illustrated in Fig. 35.
For the chosen model parameters, the explicit co-simulation approach is more efficient than the implicit method: the ExMilne approach (left y-axis, blue bars) is almost twice as fast as the ImMilne method (right y-axis, red bars). The computation time of both co-simulation approaches can be reduced significantly by selecting an appropriate value for the safety factor SF. Choosing the safety factor in the range SF ∈ [4,6] instead of using SF = 1, the computation time can approximately be reduced by 30%. The plot further indicates that the simulation time only slightly changes if the safety factor is varied in the range SF ∈ [3,8]. It should be emphasized that these findings are model dependent. Further studies, not presented here, indicate that a safety factor in the range SF ∈ [2,6] might generally be an appropriate choice [28]. with an amplitude of F j = (−1) j +1 · 1.0E6 N and the parameters δ = 1.0E − 6 s and t = 1.0E − 3 s are applied to the coupling bodies at the time points t α (t α = 1.0E − 4 s, 0.1 s, 0.2 s, . . . , 0.9 s). Note that contact forces F Con,i are not considered in the simulations of Sects. 6.6.1 and 6.6.2; however, in Sect. 6.6.3 they are considered. The subsequent macro-step size controller parameters have been used: SF = 6, atol x = 10 −3 · rtol, atol v = rtol, atol λc = 10 2 · rtol, r ρ min = 0.5, r ρ max = 2.0. The model is computed with the explicit and implicit co-simulation approaches, whereby three different cases are considered: a) fixed order (k = const.) and fixed macro-step size (H = const.), b) fixed order (k = const.) and variable macro-step size (H = var.), c) variable order (k = var.) and variable macro-step size (H = var.).
The simulation results are compared with the results of a reference simulation (monolithic simulation with very tight error tolerances). In order to compare the simulation results, the global error e glob (normalized root mean square error) of the state variables of the coupling bodies is computed. To obtain a reasonable comparison, the simulation parameters have been chosen in such a way that the accuracy condition e glob ≤ 2.50E − 3 will be satisfied.
To achieve a global error of e glob mono ≈ 2.34E − 3, a simulation time of 2.69E4 s is required for the monolithic model (mean solver-step size: h = 1.34E − 6 s).
• To obtain results of the required accuracy with the implicit co-simulation method, a simulation with k = 2 and H = 1.0E − 7 s (Im2H) exhibits the best performance. With these parameters, the implicit co-simulation takes 2.0E4 s with a global error of e glob impl ≈ 1.07E − 3. • Using the explicit co-simulation approach, best results are achieved with the parameters k = 1 and H = 2.5E − 8 s (Ex1H). The computation time with the explicit approach is 1.89E4 s yielding a global error of e glob expl ≈ 7.94E − 4. • Remark. A comparison between the monolithic simulation (Mono) and the explicit cosimulation (Ex1H) shows that the simulation time is only slightly reduced (≈ 30%) due to the co-simulation. The poor performance of the co-simulation can be explained as follows.
The model contains a few impulse-shaped external forces, which require a small macrostep size in order to obtain a stable co-simulation. The monolithic model is, however, solved with a variable step-size solver so that the integration step-size is only reduced at the time points, where the impulse-shaped forces act. Hence, the mean integration stepsize of the monolithic model is significantly larger than the mean integration step-size of the subsystem solvers. As a consequence, the formula in the introduction for estimating the possible speed-up by a co-simulation cannot be applied.   Fig. 36.
Based on the above results, the following conclusions can be drawn: • Co-simulations with a fixed macro-step size are rather inefficient for the considered model. Compared to the monolithic simulation, the computation time is only reduced by ≈ 30%. • Applying the macro-step size controller, efficiency of all co-simulation approaches may significantly be increased. A speed-up factor of ≈26.5 compared to the monolithic computation may be achieved. • For the considered model, the order control algorithm does not significantly reduce the computation time since the impulse-shaped forces only act at very few time points. At the time points, where the impulse-shaped forces act, the approximation order is -as expected -reduced. Between these time points, the order controller selects a larger integration order . Therefore, the benefit of the order controller is rather low for the considered problem. • The three error estimators for the explicit co-simulation approach exhibit a similar behavior. • The two error estimators for the implicit co-simulation method yield comparable results.

Case study 2: Modified decomposition
The decomposition of the chain model used above may not be ideal because the coupling masses are directly affected by impulse-shaped forces. To reduce the computation time, it might be useful to apply a modified decomposition. Therefore, the chain model is repartitioned into 101 subsystems in such a way that the first and the last subsystem contain 500 masses; the remaining 99 subsystems are assumed to be equally sized containing 1000 masses. The influence of the increased number of subsystems (101 instead of 100) on the simulation time is negligible. The essential point is that the impulse-shaped forces will not longer act directly at the coupling bodies, but inside the subsystems 2-100.
The results of the computations with the modified decomposition are collected in Fig. 37 (red bars). A comparison with the original decomposition of Sect. 6.6.1 shows that the computation time is (slightly) reduced with the modified decomposition. For the ExSLocal approach, for instance, the computation time is reduced by 14%.

Case study 3: Extended model and multirate effect
Finally, the oscillator chain is modified to demonstrate the multirate effect. Small subsystems with 25 masses are added at both ends of the oscillator chain. In the newly added small subsystems (subsystems 1 and 103), contact forces F Con,i = A e Bx i with A = −1.0E − 2 N and B = 5.0E5 mm −1 are assumed to act at three masses so that these two subsystems become very stiff. Subsystems 2-102 are identical with the model of Sect. 6.6.2 (modified decomposition).
Considering the monolithic simulation of the extended model, the stiff contacts strongly affect the computation time, since the solver-step size is reduced for the complete model. However, applying a co-simulation approach, the integration-step size is selected individually by the step-size controllers of the subsystem solvers. Therefore, the solver-step sizes of subsystems 1 and 103 will be reduced due to the stiff contacts. The remaining subsystems 2-102 will be integrated with larger step sizes. Since the subsystems 1 and 103 are very small, the overall computation time of the co-simulation will not be increased significantly compared to the non-extended model.
Simulation results with the extended model (containing contacts in subsystems 1 and 103) are shown in Fig. 37 (yellow bars). As expected, the computation time of the monolithic simulation is significantly increased, while the computation time of the co-simulation approaches are not changed markedly. The number of macro-steps and the resulting global error are only slightly increased.
The speed-up factor of the explicit co-simulation (ExMilne) compared to the monolithic simulation is ≈400 for the extended model with contacts (simulation time of the monolithic model: 5 1 2 days; simulation time of the co-simulation model: 30 minutes). This very high speed-up factor has mainly two reasons: (i) dynamical parallelization of the model by means of the co-simulation and (ii) exploiting the multirate effect by an appropriate model decomposition.

Conclusions
An algorithm for controlling the degree of the polynomials, which are used for approximating the coupling variables within co-simulation and solver coupling approaches, has been presented. The order controller has been incorporated into the algorithm for controlling the macro-step size so that co-simulations with variable order and variable macro-step size could be realized. Different numerical examples have been presented, which confirm the functionality and the accurate operation of the combined order and step-size control algorithm. The numerical tests also clearly demonstrated the benefit, which can be achieved with a combined order and macro-step size control algorithm.
Using a constant (i.e., fixed) integration order, the question on the proper choice of the integration order arises. As a matter of fact, the optimal integration order is problem-dependent and may change during the co-simulation. Using a constant integration order can cause different difficulties: • If the integration order is chosen too high, the co-simulation may get unstable or the macro-step size may be reduced significantly by the macro-step size controller. • Choosing the integration order too small, the co-simulation may become inefficient and less accurate. • Especially for highly nonlinear systems, for systems with impacts or for models where steep gradients occur, a simulation with a fixed integration order may be rather inefficient.
The main problems associated with the usage of a constant integration order -namely a trade-off between stability/accuracy and efficiency -will disappear, if an order controller is used.
The different numerical examples presented in Sect. 6 have shown that in most cases the efficiency and accuracy of a co-simulation could be (markedly) improved by incorporating an order control algorithm.
In our tests, failures of the order controller have not been detected, although rather stiff systems, systems with contacts and systems with steep, impulse-shaped external forces have been considered. A rather stable and robust performance has been observed, which clearly recommends the usage of a variable integration order.
Considering implicit co-simulation methods, an interface-Jacobian is required for the corrector iteration, which is usually calculated by a finite difference (FD) approach. Applying an FD approach, a large number of additional cores might be necessary in order to calculate the interface-Jacobian in parallel with the predictor/corrector subsystem integrations. Alternatively, approximation formulas for the interface-Jacobian might be used (low-order sensitivities, LOS), which can be implemented in a very straightforward manner. Using the LOS approach, the additional parallel subsystem integrations required in connection with the FD approach can be omitted without a significant increase of the computation time.
Depending on the problem, the usage of low-order sensitivities can (slightly) increase the co-simulation time, since the number of corrector iterations may (slightly) increase compared to the FD approach. Accuracy of the co-simulation is, however, not affected. In contrast, problems with ill-conditioning and numerical cancellation, which are often observed in association with FD approaches, will not occur if low-order sensitivities are applied.
Summarizing, the innovation of this work concerns the realization of variable-order cosimulations and the incorporation of the order control algorithm into the macro-step size controller. Using a combined order and macro-step size controller may significantly increase the numerical efficiency and accuracy of a coupled simulation. While co-simulation is frequently used to carry out multi-physical or multi-disciplinary simulations, co-simulation methods may also be applied to parallelize a mono-disciplinary system in time domain. To exploit the full potential of a co-simulation-based parallelized dynamical model, controlling the integration order and the macro-step size may be an essential point.
Concerning the future work, the subsequent topics may -amongst others -be interesting: • Application of the presented approaches to co-simulation methods based on constraint coupling. • Usage of co-simulation techniques to parallelize large-scaled finite element models in time domain. • Development of error estimators for alternative explicit and implicit co-simulation methods. • Error estimation and macro-step size control for systems with discontinuities.

Appendix A
This appendix contains additional information on the calculation of the interface-Jacobian, which is required for the implicit co-simulation scheme, see Sect. 3.2. Basically, the interface-Jacobian dg c du *

N+1
can be computed in different ways, e.g., by using: • analytical gradients, • numerically calculated gradients based on a finite difference approach, • approximated gradients, • gradients computed with the help of automatic differentiation.
Here, the gradients are calculated either by a finite difference approach (Appendix A.1) or with the help of a low-order approximation technique (Appendix A.2).
To explain the two approaches, we consider the equations of motion of an arbitrary subsystem s. For a concise representation, the subsystem is assumed to be a multibody system.
with {x s , v s } ∈ R n , λ s ∈ R m , see Sect. 2.2. To keep the notation simple, the subsystem index s is omitted in the following (i.e., x s → x, etc.). The coupling variables are approximated with the polynomials p * N+1 (t) = P k N+1 t; T N+1 , u * N+1 , (T N , u N ) , . . . , (T N−k+1 , u N−k+1 ) where L i (t) represent the Lagrange basis polynomials.
The two approaches for calculating the interface-Jacobian are explained by regarding the αth corrector step, i.e., the calculation of the interface-Jacobian dg c du * N+1 u cor,α−1
To compute the interface-Jacobian, it is useful to rewrite the term dg c du *

N+1
. The total deriva- where I is the identity matrix. Since the coupling functions g c := u − ϕ (t, z, u) = 0 or, more precisely, the functions g c := u − ϕ (t, x, v, u) = 0 are assumed to be explicitly known, the terms ∂ϕ ∂u *

N+1
can be expressed analytically. Note that for systems without feed-through, the coupling equations simply read g c := u − ϕ (t, x, v) = 0 so that the term ∂ϕ ∂u * N+1 equals zero. The main task to compute the interface-Jacobian is therefore the calculation of the term which describe the sensitivities of the subsystem states with respect to the coupling variables, have to be determined.

A.1 Numerical calculation of the interface-Jacobian using a finite difference approach (FD sensitivities/gradients)
To numerically calculate the partial derivatives of Eq. (41) with the help of a finite difference approximation, additional subsystem integrations have to be carried out in parallel with the , where u * i,N+1 = u * N+1 · e i represents an arbitrary input variable.

N+1
for the next corrector step is calculated in the same way.
The perturbation increment u * i,N+1 has to be chosen very carefully. For the simulations in Sect. 6, the subsequent perturbation increments have been used where u min i is a user-defined (small) value; u min i has to be chosen appropriately with respect to the type of the coupling variable (position, velocity or force variable).
The corrector iteration is terminated if a convergence criterion is fulfilled. For this purpose, the scalar weighted root-mean-square norm is applied; atol i and rtol term user-defined absolute and relative error tolerances, respectively. The corrector iteration is terminated if the following criterion is fulfilled: The factor R α =

Appendix B
In Sect. 6.3, the interface-Jacobian for the implicit co-simulation approaches (ImMilne and ImSLocal) has been calculated with a finite difference (FD) approach. In this appendix, the same simulations are considered; the interface-Jacobian is, however, generated with the loworder sensitivities (LOS) of Appendix A.2. As can be seen in Figs. 38,39,40,41, the basic behavior with the low-order sensitivities is quite similar to the simulation results with FD gradients. For the considered model and parameters, the performance with the FD gradients is little better. Applying the LOS approach, the number of corrector iterations is usually little larger compared with the FD implementation so that the co-simulation time may slightly be increased in association with the LOS approximation. Using FD gradients, numerical problems (ill-conditioning, numerical cancellation) may be observed if the perturbation increments are chosen inappropriately or if the macro-step size becomes very small. Then, the usage of LOS may even be more efficient than applying FD gradients [29].   Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.