Identification of nonlinear dynamical systems with time delay

This paper proposes a technique to identify nonlinear dynamical systems with time delay. The sparse optimization algorithm is extended to nonlinear systems with time delay. The proposed algorithm combines cross-validation techniques from machine learning for automatic model selection and an algebraic operation for preprocessing signals to filter the noise and for removing the dependence on initial conditions. We further integrate the bootstrapping resampling technique with the sparse regression to obtain the statistical properties of estimation. We use Taylor expansion to parameterize time delay. The proposed algorithm in this paper is computationally efficient and robust to noise. A nonlinear Duffing oscillator is simulated to demonstrate the efficiency and accuracy of the proposed technique. An experimental example of a nonlinear rotary flexible joint is presented to further validate the proposed method.


Introduction
Time delay exists in many engineering, physics, chemistry, biology and economics systems. In control systems, time delay due to sensor and actuator dynamics, signal transmission, and digital computations is an important factor that influences the stability and control performance. To make matter worse, time delay is often unknown. Time delay estimation in a control system is a challenging problem. It is even more challenging when the system dynamics is nonlinear and unknown. This paper presents a nonparametric identification technique to identify nonlinear dynamic systems and estimate time delay introduced by the feedback control.
The methods for time delay estimation can be in frequency or time domain [17,18]. In this paper, our focus is on the approaches in time domain. Since time delay usually appears in the system implicitly, the methods for conventional parameter estimation of dynamic systems cannot be directly applied to estimate time delay. Time delay τ usually appears in the exponential term e −τ s in the transfer function of the system. The expansion techniques can parameterize it, including the classical Padé approximation, the Laguerre Fourier series, the Kautz series, the second-order Padé shift and the diagonal Padé shift. The main concern with the rational approximation is the truncation error and stability complications. Even though higher order expansions can reduce the truncation error, the system can become unstable even when the system is linear with a constant time delay [1]. In this paper, we employ the Taylor expansion. Xu demonstrates that the low order Taylor expansion gives promising estimation of small time delay [19].
The work in [20] presents a nonlinear least square-based algorithm in which the instrumental variable method estimates the parameters of the transfer function of the system while an adaptive gradient-based iteration finds the optimal time delay from the filtered irregularly sampled data. The main problem with this algorithm is that the proposed cost function may have several local minima. Therefore, it highly depends on the initial guess for the parameters and especially the time delay. To deal with this issue, the authors use a low-pass filter to widen the convergence region around the global minimum. Another nonlinear recursive optimization algorithm is proposed in [21], which combines the linear method of Levenberg-Marquardt to compute the plant parameters with a modified Gauss-Newton algorithm to estimate time delays. A low-pass filter and a binary transformation are applied to the data corrupted by the white noise to create the regressor matrix for minimizing a quadratic cost function of the estimation error. The identification is online and is demonstrated on a MISO plant with multiple time delays. Similarly, the algorithm in [9] employs the Gauss-Newton method to estimate time delay when the simplified refined instrumental variable (S R I V C) method is used to find the plant parameters.
The Taylor expansion is used to parameterize the system with explicit time delay in [15]. An adaptive law for the parameter estimation is proposed such that the estimation error is converged. A recursive formula is introduced in [6] to improve the accuracy and convergency rate of another recursive algorithm for online estimation in [22]. The strategy in [8] aims at fractional time delay identification for discrete-time systems. It separates the influence of the system structure and the time delay by discretizing the system. With the help of the Kalman filter, the parameters are estimated recursively. The algorithm in [11] first parameterizes the system by a polynomial function and trains a neural network to estimate the parameters and time delay. A Schweizer and Wolff's σ measure denoted as σ SW from the copula theory is introduced to study the relationship of input-output signals for SISO systems [23]. It is found that the measure reaches its maximum when the time delay is removed from the data. This property offers an approach to estimate time delay without the need of estimation of other parameters.
Most existing methods for time delay estimation rely on the knowledge of the system model. In this work, we propose the application of a recently developed nonparametric system identification technique to the time delay estimation. Brunton and colleagues proposed a method called the sparse identification of nonlinear dynamics (SINDy) in 2016 for creating a sparse representation of the unknown nonlinear function of the system [24]. The method has attracted a great attention from the community. It assumes that only a few important terms out of a library of functions are needed to describe the dynamics of the system. It combines machine learning and sparsity-promoting techniques to find the sparse representation in the space of possible functions to model nonlinear dynamical systems. The connection of the SINDy method to the Akaike information criteria (AIC) for model selection has been studied in [25]. The promising results for system identification problems such as hybrid dynamical systems, chaotic Lorenz system and Burger's partial differential equation have been obtained [25,26]. The SINDy method applied to the model predictive control delivers better performance, requires significantly less data, and is more computationally efficient and robust to noise than the neural networks model [27]. Robustness to noise and requirement for measurements of derivatives are concerns with the SINDy approach [28]. The total variation regularized derivatives are commonly used to estimate the derivatives [29,30]. An integral form of equations of motion in combination with sparse regression is proposed in [31]. An application to model identification of nonlinear mechanical systems is reported in [32].
In practical cases, the data is often contaminated with noise. Fliess and Ramirez proposed a robust and fast algebraic identification technique to estimate time invariant linear systems without time delay [33]. Inspired by the algebraic operation, Belkoura in [12] and [13] first investigated the identifiability conditions for a general class of systems described by convolution equations. Then, an algebraic formulation is introduced for online estimation of time delay and parameters of structured and arbitrary input-outputs.
In this paper, we extend the SINDy approach by combining it with the algebraic signal processing method to deal with the issue of measurement noise, initial conditions and derivatives. The algebraic operation generates useful signals for system identification while filtering out the noise. We apply the Taylor expansion to make the time delay appear as a parameter of the model to identify. A nonlinear extended state estimator is adopted for derivative estimation. As a result, we arrive at a robust sparse regression combined with cross validation and bootstrapping techniques for nonparametric system identification. The simulation and experimental results illustrate that the proposed algorithm can overcome the following limitations of the existing techniques: -Multiple local minima Achieving the global minimum is the main challenging for modified least square methods with recursive approach. The sparse regression is a convex optimization problem with a global minimum [34]. -Nonparametric nonlinear identification The only assumption of the proposed algorithm about the structure of the system is that the system is sparse in the space of base functions. The proposed algorithm relies on the data to make selection and is not limited to linear and SISO systems. -Noise resistant The sparse regression is already robust.
The algebraic operation offers additional filtering of the noise. Furthermore, the proposed algorithm is equipped with bootstrapping to study the statistics of estimation such as mean and standard deviations. -Unstructured entries Many classical strategies are designed for pulsed entries. The proposed approach analyzes the input and output data without any frame and structure assumption for entries. -Initial conditions Another general assumption in time delay estimation is that the initial conditions are zero while in most of practical applications, it is not true. The algebraic operation and operational calculus make the proposed algorithm independent of the initial conditions.
The rest of the paper is organized as the follows. Section 2 introduces the assumptions and formulates the mathematical problem of system identification. The techniques of algebraic data preprocessing and derivative estimations, sparse regression in combination with bootstrapping resampling, and cross validation are explained in Sect. 3. Section 4 presents an example of a simulated nonlinear mass-springdamper system under a proportional control with time delay. The experimental validation of the proposed algorithm on the rotary flexible joint made by Quanser is presented in Sect. 5. Finally, Sect. 6 concludes the paper.

Problem definition and assumptions
Consider a closed-loop second order system given by, where m > 0 is the mass of the system, the function g(x,ẋ) represents the nonlinear restoring or internal force of the system. The term f (t) contains the reference information as well as external disturbances. The control consists of the output and its first-order derivative with feedback gains b 0 and b 1 , and a time delay τ . We make the following assumptions in this study.

Assumptions
1. Only the system response x(t) is measured. 2. The system is exposed to random excitations included in f (t). 3. The measurement contains Gaussian white noise with zero mean denoted as x . 4. The system structure in terms of the function g(x,ẋ) is unknown. 5. The system is second order. This paper is focused on nonparametric identification of closed-loop nonlinear dynamical systems with a control time delay.

The proposed method
For the restoring force g(x,ẋ) with an unknown structure, there are many candidate functions available to approximate it, such as polynomial, trigonometric, exponential functions or a combination of these functions. Because the polynomial is a popular base function to describe a wide range of dynamical systems, we use regular polynomials of x andẋ with time-invariant coefficients to explain the proposed method, where c i j are the unknown coefficients of the polynomial.
Other functions can also be considered with the proposed method.
For the system without any prior knowledge, the orders N and M of the polynomial are unknown. This study develops an algorithm to identify the polynomial defined in Eq. (2) from the response data such that it has a minimum number of terms and with the minimum orders N and M.

Algebraic operation
Noise robustness has been always an inevitable part of system identification techniques as the noise is unavoidable in real data. Time delay caused by sensors and actuators is usually so small that it is difficult or even impossible to estimate correct values of time delay from noisy data. In addition to noise robustness, dynamical system identification methods require derivatives of time series of the measured response. Traditional finite difference methods to estimate derivatives can amplify the noise. To deal with this issue, we propose to apply the algebraic operation to pre-process the data.
We use a nonlinear oscillator under a proportional control with gain k p to illustrate the method.
where m denotes mass, c 1 , c 2 and c 3 stand for damping coefficients, k 1 and k 3 are the stiffness coefficients. The system starts from initial conditions (x 0 , v 0 ) which are generally unknown. We introduce several short-hands for the nonlin-ear terms.
Applying the Laplace transform to Eq. (3), we have Consider the Taylor expansion of the exponential term e −sτ We should point out that keeping too many higher order terms may not be beneficial to the identification process. We only need to keep a sufficient number of terms to generate enough equations to determine the unknown parameters including time delay.
As an example, we keep the terms up to the third order and substitute the Taylor expansion in Eq. (5).
To eliminate the initial conditions, we differentiate Equation (8) with respect to s twice. To eliminate the derivative terms in time domain, we divide the resulting equation by s 3 . Back to time domain, we obtain an equation of signals be a set of sampled times. Define an error at time t k as Let us introduce an error vector e, a parameter vector c and a force vector p as follows.
Let P(x) be the n t × 7 data matrix determined by the measurement of x(t) such that its (k, j) th component is given by P k, j = P j (t k ). Then, Eq. (11) can be written in the matrix form as To find the unknown parameters c, we formulate a least mean square error problem as follows. For the following cost function, we find the parameter vector c such that J is minimized. The solution of the parameters c can be found by either the matrix inversion or by an iterative search algorithm. In general, when the numbers of unknown parameters and data points n t are large, as is the case for multi-degree of freedom systems, it is better to use a global search algorithm to compute the parameters.

Sparse representation
Assume that the displacement measurement x(t) of the system in Eq. (3) and external force f (t) are available and contain random noises. We consider the polynomial model for the restoring force in Eq. (2). Two questions immediately arise: What are the minimum orders N and M? Are all the terms in the polynomial needed?
To investigate answers for these questions, we apply the methods in statistical learning [35]. One popular way is to start with big enough orders N and M or a large library of functions which may contain non-polynomial functions, and penalize the terms with small coefficients through a sparse regulator by using the least absolute shrinkage and selection operator (LASSO) [35]. The recent studies of sparse identification of nonlinear dynamics (SINDy) have developed a computationally efficient algorithm to compute the solution of the sparse regression. LASSO regression proposes to add an L 1 regularization term to the cost function in Eq. (14) to penalize the nonzero parameters that don't contribute to the system's dynamics where λ > 0 is a preselected positive number known as the sparse regulator and c 1 = i |c i |. The sparse regression algorithm in [24] is applied to find the parameter vector c in order to minimize J λ in Eq. (15). The algorithm is computationally efficient and robust to noise. It should be noted that the optimization problem in the framework of the LASSO is convex [34], which implies the existence of a unique optimal solution. The selection of the sparse regulator λ is critical. In Sect. 3.3, we explain how to employ cross-validation techniques to select a proper regulator value. The LASSO penalizes the terms with small coefficients by regularization and keeps the important terms in the system model. In real world, time delay is usually small. Moreover, time delay appears in high order terms of the Taylor expansion. Therefore, the LASSO will penalize these terms, such as the coefficient of term P 1 (t) in Eq. (11).
We can recover the terms involving time delay in the following way. After each sparse regression computed with the SYNDy algorithm, we keep all the terms involving time delay as well as the terms selected by the LASSO regulation. The resulting data matrix denoted by P s (x) is substituted in Eq. (15) to compute the updated coefficients c.

Cross validation and bootstrapping
It is common to use cross-validation techniques from machine learning to determine the sparse regulation parameter λ. The value of λ in an finite interval is sampled. The cross-validation mean square error M SE CV of the model over the test dataset is computed. The λ value which minimizes the cross-validation error is selected. The corresponding model is chosen as an optimal model with a proper balance of complexity and accuracy.
Let T cv denote the set of time instances of the test signal to be used for cross validation. For a given regulation parameter λ, the mean square cross validation error M SE CV of the model over all the validation datasets is defined as where c λ denotes the model parameters for the regulation parameter λ. M SE cv is an implicit function of λ. The SINDy algorithm attempts to select λ on the Pareto front of the multiobjective optimization problem with the objectives being accuracy and complexity of the model. The elbow of the Pareto front parameterized by λ is often the choice [24]. Unfortunately, in most of cases, the elbow of the Pareto front is ambiguous due to existence of a cluster of candidate models near the elbow. The information criteria (IC) for the candidate models can help to rank and select a model with a proper trade-off between the accuracy and complexity [25]. Popular statistical examples of the information criteria include the Akaike information criterion (AIC), Bayesian information criterion (BIC), deviance information criterion (DIC) and minimum description length. The work reported in [25] makes use of a big data matrix P(x). The sparse regression procedure is repeated over all possible combinatorial subsets of the data matrix. The resulting models are ranked through IC scores. The model with the smallest score is selected.
However, a big data matrix P(x) may not always be available for real-world applications. In this work, we propose a search algorithm to determine the sparse representation of the polynomial in Eq. (2). We start with linear model when N = M = 1, and increase the polynomial order until the prediction error of the model over the test dataset reaches an acceptable low level and begins to increase. The error of the test dataset is defined as, whereĉ stands for the estimated vector of coefficients. There are different ways to generate training and test datasets. When the SINDy algorithm is applied to simulation examples, we can generate rich datasets to train and validate the model by considering the system responses for different initial conditions and excitations. In this work, we assume that the dataset consists of two long time series of the system response. One is used for training and another for cross-validation. The data matrices P(x) are generated for different orders of the polynomial. The SINDy algorithm selects the sparse model built on the training data while the regularization parameter λ is selected with the help of cross validation on the test dataset. Real measurements often contain noises, and can have outliers and missing data. Here, we propose to combine the sparse regression with bootstrapping in order to develop robust sparse regression. In particular, we consider K bootstrap sample vectors containing L elements of the original data points. Each vector is generated by uniform sampling of the data with replacement. For each bootstrap sample vector, the sparse regression is applied to identify the model. Finally, the parameters of the model is the average of the K estimated coefficient vectorsĉ l .
The standard deviation of estimated coefficient vectorsĉ l is computed to study the variation of parameter estimation. Algorithm 1 summarizes the procedure. return Model (c) 22: end procedure Remark 3.1 Some remarks on the estimation error and the stability of the system are in order. The estimation error of the coefficients c, in particular, time delay τ , can be attributed Fig. 1 The cross validation test error for the mass-spring-damper system to two sources: the truncation error of the Taylor expansion and the regression error.
While keeping more higher order terms of the Taylor expansion helps to reduce truncation error, we may run the risk of instability of the truncated model. We have found that the third order term is a good compromise for accuracy, complexity and stability.

Simulated example
To demonstrate the proposed algorithm, we consider the second-order oscillatory system with nonlinear stiffness and damping in Equation (3). The following external forces are used to generate the training and test datasets.
f (t) T rain = 10 sin(4t) + 40 sin(4t 2 ) + f (19) f (t) T est = 10 sin(2t) + 40 sin(8t) + f (20) For the training dataset, the system starts from the initial conditions x 0 = 1 and v 0 = 0. For the test dataset, the initial conditions are x 0 = −1 and v 0 = 0.5. The excitation contains a normally distributed random noise f with zero mean and standard deviation σ f = 0.01. We assume that the sensors have measurement noises such that the system output is given by x(t) + x where x is the normally distributed random noise with zero mean and standard deviation σ x = 0.01. Table 2 lists the system parameters used in the simulation. A proportional control with gain k p = 2 is considered. We employ the third order of Taylor expansion of the time delay x(t − τ ). The order of expansion is selected such that there are enough equations to solve for the unknown parameters of the system. Following Algorithm 1, we create the data matrix  Because the first term includes the time delay, we constrain the LASSO algorithm to keep it from being penalized and removed in the process of searching for sparse representation. The average of sparse polynomials out of all the bootstrapping samples is taken as the final result. Figure 1 shows the variation of the cross validation error M SE cv as a function of the order N of the polynomial. It is seen from the figure that the cross validation error has a large drop when approaching N = 3 and stays at the same range for N = 4 and starts increasing from N = 5. We choose N = 3 as the optimal order of the polynomial when the minimum validation error occurs and the order is minimum. Table 1 lists the estimated coefficients of the signals in Eq. (9). The proposed algorithm has successfully detected the sparse terms and precisely estimated the coefficients. The time delay and parameters of the original system are calculated and listed in Table 2 together with the known values. The accuracy of the estimated parameters is quite acceptable. Table 1 is in order. This fact is an indication that the randomness of the data is not significant so that the bootstrapping samples are not sufficiently different. We have checked the results several times to confirm this observation.

Experimental example
In this section, we employ a flexible joint experimental setup, made by Quanser, to prove the efficiency of the method for experimental data from a single-input-multiple-outputs (SIMO) dynamical system. The joint is connected to the base through two springs with equal stiffness k s and the base is fixed to the servo motor as Fig. 2 shows. The servo motor with input voltage V m generates a torque M to turn the base with a rotation angle θ . The deflection angle of the joint relative to the base is α. The moment of inertia of the base is J eq . The viscous damping coefficient of the base is B eq . J l stands for the moment of inertia of the joint. The equations of motion for the system can be derived as, where torque τ M is linearly dependent on the servo motor voltage V m andθ . We assume that the motor parameters are known. Table 3 lists the parameters of the motor and joint provided by Quanser. We should point out that these numbers may not be exactly the same as the physical system parameters. Therefore, we should treat these numbers as a reference.
A proportional control with gain k p adjusts the motor voltage to make the joint follow the desired trajectory of the angle θ while the deflection angle α stays minimum. The control is given by The control is implemented in Simulink/MATLAB. The outputs of the system are θ and α. A time delay is introduced in the feedback signal θ .
In this example, we employ the second order Taylor expansion of the time delay term. The closed-loop system can be written as, We should point out that the tracking performance of the closed-loop system will not be as good as we would like to have. This is because the proportional control alone is not adequate and the time delay further deteriorates the performance. The purpose of choosing this control with time delay is simply for generating the data in a stable manner.
To generate the training dataset, we select a desired trajectory θ d as where the frequency f 1 = 0.66H z is for the desired trajectory θ d . To generate the test data, we choose a square-wave signal for the desired trajectory θ d with the same amplitude and frequency as in Equation (26). We introduce a 0.2s delay in the control such that the closed-loop system is stable. Figures 3 and 4 show the motor input voltage V m and the sample responses of θ and α, The time series is one minute long with a sample time 0.001s. For simplicity, we assume that for each coordinate θ and α, the restoring force can be represented by a sum of four polynomials of a single variable, i.e. θ i ,θ i , α j , andα j , together with a friction term related tȯ θ , without considering the cross-product terms. The second coefficient in Equation (25), ak p τ 2 2 , includes the time delay. We apply the constraint to the LASSO algorithm to keep it from being penalized and removed.
Some signals in the matrix P(x) contain the first order derivative of the anglesθ andα. Recall that the real mea- surements contain noises. We use the nonlinear state observer from the control studies to estimate the first-order derivatives without amplifying the noise in the computation [38,39]. The second order observer with gains β 1 and β 2 is defined by the following state equations.
where fal(e, α, δ) = |e| α · sign(e), |e| > δ e δ α , |e| ≤ δ (28) and the error e = z 1 − θ . We have chosen α = 0.5, δ = 0.05, β 1 = 100 and β 2 = 900. According to [38,39], the estimation error e converges to zero quickly. Consequently, by definition, z 2 is an accurate estimate of the first order derivativeθ . Figures 3 and 4 show the plot of estimatedθ andα for both training and test dataset. To train the model, we consider 10 bootstrapping samples with the length of 50 percent of the training dataset and search the polynomial order from 1 to 7. For the sparse regulator, 100 values of λ are sampled logarithmically in the range from 10 −10 to 10 0 . Figure 5 shows the variation of the cross validation error as a function of the order of polynomials. The results suggest that for both coordinates θ and α, the polynomial order N = 5 leads to the minimum cross validation error. This is the optimal order for the polynomials. Table 4 lists the polynomial terms and the corresponding coefficients and associated standard deviation for the trained model of polynomial order N = 5. We should mention that the reported results have been round to 10 −5 . The estimated value of Coulomb friction f c is not negligible. It can play an important role in the system dynamics. We should point out that the linear model in Eq. (21) does not include the friction term, although the actual device always has friction. The friction is the primary reason for the discrepancy between the predicted response by Eq. (21) and the actual measurements. Figures 3 and 4 indicate that the responses of the closedloop system are not really tracking the references accurately. This is partly due to the effect of time delay and also the poor control design, as discussed earlier. The oscillatory responses of the closed-loop system are obviously responsible for the high order terms of the polynomials, particularly for the deflection angle α, as can be seen in Table 4. This highlights the ability of the proposed system identification algorithm to estimate the time delay and to identify the nonlinearities in the system when the linear models are no longer adequate. The motor parameter a is known and k p is the given control gain. Hence, from the coefficient −ak p τ 2 2 , the time delay can be calculated and is listed in Table 5 in comparison with the time delay we introduced to the control. The values of the parameters in Table 5 fall in a wide range from 0.002 to 2. Hence, the parameters can be serval order of magnitudes apart, which makes it difficult to accurately estimate all the parameters in the presence of unwanted noises. This experimental study strongly demonstrates the robustness of the proposed algorithm to noises.

Conclusions
In summary, we have demonstrated that the proposed algorithm is effective to obtain governing equations of nonlinear dynamical systems with time delay from noisy experimental data. It can accurately estimate the time delay in the feedback control. For the first time, this study extends the sparse regression to nonlinear dynamical systems with time delay. We have equipped the sparse regression with an algebraic signal pre-processing and a nonlinear state observer. These operations are essential to compute the needed derivatives of measured time series and other signals without the need to know initial conditions, and to filter noises due to random excitations and measurements. Both simulation and experimental results have been used to validate the algorithm. The algorithm demonstrates excellent performances in identification.
The codes and all the data are available at: https://github. com/gleylaz/TimeDelay_SysID.