Influence of the optimization methods on neural state estimation quality of the drive system with elasticity

The paper deals with the implementation of optimized neural networks (NNs) for state variable estimation of the drive system with an elastic joint. The signals estimated by NNs are used in the control structure with a state-space controller and additional feedbacks from the shaft torque and the load speed. High estimation quality is very important for the correct operation of a closed-loop system. The precision of state variables estimation depends on the generalization properties of NNs. A short review of optimization methods of the NN is presented. Two techniques typical for regularization and pruning methods are described and tested in detail: the Bayesian regularization and the Optimal Brain Damage methods. Simulation results show good precision of both optimized neural estimators for a wide range of changes of the load speed and the load torque, not only for nominal but also changed parameters of the drive system. The simulation results are verified in a laboratory setup.


Introduction
In most electrical drives, the elasticity of the shaft between a driving motor and a load machine must be taken into account. In order to obtain drive response to a reference signal with high dynamics, and to minimize torsional vibrations, different control methods of the drive system with elastic joint, based on control theory, like PI/PID methods, state controller-based methods, sliding-mode, and adaptive or predictive control methods [1][2][3][4][5][6] are used. All these control methods require feedbacks from different mechanical state variables of the system (load side speed, torsional torque, load torque). These mechanical variables can be measured, but only in laboratory environments. In the real drive systems, in industry, torsional or load torque can not be measured, as the torque transducer is never mounted between the driven motor and the loading machine because lack of space and generation of additional (high) cost. Similarly, the load side speed is hardly measured because lack of place for additional speed transducer and additional cabling, which is troublesome. In such a case, only estimation of those state variables is the solution for the industry conditions. This is the reason why we have to estimate the torsional torque and the load side speed of a two-mass system.
In many applications connected with electrical drives, algorithmic methods are applied for the non-measurable state variables estimation, for example, the Kalman filters [4,5] and the Luenberger observers [6]. However, the algorithmic estimators require the mathematical model and parameter knowledge of the system, which could change during the system operation-so to obtain the good estimation quality the parameters of the state estimators must be tuned on-line (by on-line plant parameters' identification or estimation). Alternative ways of solving this problem are estimators based on neural networks (NNs). Such estimators do not need a mathematical model and parameters of the system, only the training data are required [7][8][9] for the estimator design. Moreover, the generalization ability causes that neural estimators are less sensitive to parameters or measurement signals uncertainties.
However, in the case of NN applications in state variable estimation, the determination of NN structure for a specific task is one of the most important problems. This structure should be carefully chosen to obtain good estimation quality also in the case of NN input data different than those used in the training procedure. It means that a suitable generalization ability is required. Data generalization is one of the main advantages of the NN and consists in the possibility of solving a given task by a trained network in case the elements of the input vector are not taken into account in the NNs training process. In the technical literature, many methods for the improvement of the NN generalization properties are presented. It is possible to distinguish three main trends [7]: • impact on the length of the learning process (early stopping) [10], • application of regularisation method [11], • modification of neural networks topology (growing or pruning) [12,13].
Many methods for NN structure optimization are presented in the literature. Most of them require the initial choice of NN structure, and then, selected neural connections are eliminated. One of the simplest ways to choose a specific inter-node connection for elimination is the analysis of absolute values of NNs' weights. Another method consists in checking the influence of each connection on the generalization error. In this case, the generalization errors before and after the elimination each weight factor are compared [7,14].
Very good results are obtained with the sensitivity methods. These algorithms are based on the analysis of sensitivity of the cost function to deletion of individual connections. The most important methods in this category are the Optimal Brain Damage (OBD) [15,16] and Optimal Brain Surgeon [17,18] methods.
In many techniques, genetic algorithms are also applied for pruning the inter-neural connections [19].
The other solution is adding the regularization element to the cost function [17]. It consists in the modification of the objective function used in the training algorithm, which is next minimized in any iteration. In the extended form of such cost function, elements dependent on values of the inter-neural connection weights are added to the standard cost function; then, the problem of the selection of regularization parameters in the modified objective function appears. In this work, the regularization method based on the Bayesian interpretation of NNs is applied. This algorithm gives analytical formulas for automatic computation of optimal regularization parameters [20,21].
It is reported in the literature that the Bayesian regularization method can significantly improve the quality of state variable estimation. So in this paper, the effectiveness of this method is compared with the previously used OBD method (which is rather complicated in practice [16]) for the NN state estimators of a two-mass drive system. This paper presents neural estimators of the torsional torque and the load machine speed for a drive system with elastic joints. These neural estimators are trained with classical Levenberg-Marquardt method [7] and next they are optimized using OBD and Bayesian regularization methods. The obtained estimators are tested in the openloop and closed-loop control structure with additional feedback adjusted suitably for damping the torsional vibration of the drive system with elastic coupling between the driven motor and the load mechanism.
The paper is divided into seven sections. After a short introduction, the mathematical model of the two-mass drive system is presented. Then, the speed control structure with a state controller and feedbacks from the motor speed, shaft torque, and the load speed are described. These two last state variables are estimated by the tested NN, and the motor speed is measured directly as well as the motor current, which form the input vectors of NN estimators. In the next part, the discussion of the NN input vector selection for the analyzed task is presented. In the forth part, the chosen methods for the improvement of the NN generalization properties are described. This paper is focused on two methods: the Bayesian regularization and the OBD method. The designed NN estimators are next implemented in the control structure and tested under simulation (section five) and experimental tests (section six). The paper is completed with short conclusions.
2 Design of the state controller for a two-mass drive system The electrical drive with anelastic joint can be described by different mathematical models, depending on the exactness of the elastic shaft modeling. Usually, such drive is analyzed as a system composed of two masses connected by an elastic shaft, where the first mass represents the moment of inertia of the drive and the second mass refers to the moment of inertia of the load side (see Fig. 1). It is assumed that value of the moment of inertia of elastic shaft J c is much smaller than the moments of inertia of the driving motor J 1 and the load machine J 2 . This assumption involves the neglecting of the moment of inertia of the elastic shaft.
For the further considerations, the damping coefficient D of this elastic shaft is assumed as equal to zero, which leads to enlarging the influence of elasticity of the shaft on the drive system operation. Moreover, the nonlinear phenomena, like friction and backlash, are omitted; thus, the mechanical part of the considered two-mass drive can be described by the following state equation, in the per unit system, using the following notation of new state variables [2]: with: where X 1 , X 2 , X N -motor speed, load side speed, and nominal speed of the motor (rad/s), M N -nominal torque of the motor (Nm), x 1 , x 2 -motor and load speeds, m e , m s , m L -electromagnetic, shaft, and load torques in the per unit system. The mechanical time constant of the motor-T 1 , the load machine-T 2 and the stiffness time constant-T c are thus given as: where x 1 , x 2 -the motor and load speeds, m s , m L -the shaft and load torques, T 1 , T 2 -the mechanical time constants of the motor and load machine, T c -the stiffness time constant. The block diagram of such system with elastic connection between the motor and the load machine is shown as the part of Fig. 2 (dashed-line rectangular).
The classical cascade control structure of such drive system consists of two major control loops: the inner control loop contains the current controller, the power converter, and the motor. After optimization, the current (or torque) control loop can be replaced by the first-order inertial block with small time constant. During the design process of the speed loop, the dynamics of the torque loop is very often neglected [2]. In most cases, the PI speed controller is used in the external control loop. In this paper, the state-space controller with an integral action for steady-state error elimination is applied for the speed control of the drive system with elasticity (see Fig. 2).
Taking into account the equation for the required value of the electromagnetic torque, generated by the motor (with neglected dynamics of the torque loop and negative feedbacks from all state variables): Introducing the Laplace transform for the mathematical model of the drive system (1-3) and (5), we obtain: where and s-operator of the Laplace transform. Including (9) in (6), the following set of equations is obtained: Introducing (12) in (11) and (13), we obtain: then transforming this set of equations, Eq. (16) is obtained: which enables the determination of the transfer function of the closed-loop control system with R replaced by (10): The characteristic equation of this transfer function has the following form: In order to calculate the expressions defining gains of the designed state controller, the characteristic equation of the closed-loop system (18) has to be compared to the reference polynomial of the same order. The following form of this polynomial was taken into account: where n r , x o -are the required damping factor and resonance frequency of the closed-loop system. Comparing the elements with the same power of the Laplace operator s, the following expressions for the suitable gains of the state controller can be obtained: ð22Þ Feedbacks from all mechanical state variables of the two-mass system are introduced to the external control loop, so the information about the shaft torque m s , motor speed x 1 , and load speed x 2 is needed. Measurement of the motor speed x 1 is simple and trouble-free, but the measurement of the shaft torque and the load speed can be difficult or expensive in the industrial practice. In this case, we can use special estimation structures based on neural networks to estimate these variables, based on easily measurable driven motor speed and current (electromagnetic torque of the driven motor is proportional to this current). So in the control structure, we will use the measured motor speed x 1 and the estimated variables, like shaft torque m se and load speed x 2e (see Fig. 2).

Neural network based state variables estimators
As it was said before, the mechanical state variables required for feedback signals in the control structure of the drive system with an elastic joint have been estimated by NN-based estimators. For this research, the feed-forward NNs were selected. The previous research shows that this type of NN can give a high precision of the state variables estimation of the two-mass drive system [8], but the selection of proper NN structure is difficult and usually done by trial and error, which is a time-consuming method. To avoid this problem, we have selected some structure of NN (after a few preliminary simulation tests) and next tried to optimize this structure using two optimization methods, well known from the neural networks theory.
Starting structures are the same for both presented estimators-for the load side speed x 2e and the shaft torque m se : {6-10-12-1}-6 inputs, 10 neurons in the 1st hidden layer, 12 neurons in the 2nd hidden layer, and 1 neuron in the output. For the hidden layers, the nonlinear tangensoidal activation functions are applied. The linear activation function is selected as the output function of the considered neural estimators.
The proper selection of elements of the input vector of neural network is very important for correct realization of the required task. The selection of input elements in the design process of neural estimators should take into account the properties of NNs and practical aspects of the analyzed implementation. It should be noticed that the expansion of the input vector of NN can influence the structure of the net, in result it influences on its practical implementation (e.g., using FPGA-for the time of calculation and consumption of resources). At the same time, in the case of an expanded input vector, results of NN calculation can improve slightly, or-on the contrary-they can be even worse. From the engineering point of view, signals included in the NN input vector should be selected carefully to fulfill the following conditions: • give an important information about changes of the state variables of the process • they should be easily measured in the real system.
According to these requirements, the input signals of the neural networks in our case are the motor speed x 1 and the electromagnetic torque m e (or stator current) of the driven motor.
In the presented application of neural estimation, MLNN (multi-layer neural networks) were implemented. It should be noted that the NNs analyzed in the described application are static systems; they do not have internal feedbacks or memories. On the other hand, the presented application is focused on dynamical signals of the drive system, quickly changing in time. Therefore, to take into account the dynamics of the processed signals and to obtain better quality of the state variable estimation, the input vector of MLNN was extended with the delayed samples of input variables (motor speed and electromagnetic torque). So the form of the assumed input vector is described by the following equation: where J-Jacobian matrix of the cost function E with respect to the weight values, g-learning factor, I-identity matrix, e-difference between target output of the training data and the network output. Next, the previously selected structure of NN estimators was optimized using the Bayesian regularization and OBD methods. The effectiveness of these methods in the described task has been compared and evaluated.

Bayesian regularization method
The neural networks training process can be defined as a minimization of the objective function. In the considered case, the analyzed cost function is described by a following equation: where element E D is a sum of squares of NN calculation errors for each input sample, and E W is an additional regularization term presented below: where d j -desired output values; y j -actual output values of the neuron; M-dimension of the vector d, w iweights; W-the total number of weight and biases in the network.
In relation to the objective function (26), the problem of selecting parameters a and b appears. The regularization parameters describe the influence of suitable terms E R and E D on the cost function. The first one decides about NN exactness in respect to the training data, and the second one enforces the smoothness of NN output [20,21]. If a is relatively significant in comparison with b, the training error is smaller and the effect is like in a classical algorithm. In the other case, the training process gives smaller weights and leads to a smoother network output. Therefore, the optimal values for those factors are extremely important to achieve good estimation quality. In many cases, these parameters can be chosen using crossvalidation techniques, but this procedure is time consuming. In the Bayesian interpretation of NNs, the optimization of inter-neural weights corresponds to the increase of probability: PðwjD; a; b; AÞ ¼ PðDjw; b; AÞPðwja; AÞ PðDja; b; AÞ ð29Þ where w-weight coefficient vector, D-training data, Astructure of the neural network, P(D|a,b,A)-normalization element, P(w|a,A)-describes the information on the weights' values before introducing the training data, P(D|w,b,A)-probability of obtaining the established response of the NN for suitable inputs, depending on parameters of the network. Under the assumption that noise in the input data (measurements) used in the process of NN training is a Gaussian and the probability of weight distribution is also a Gaussian, suitable elements in Eq. (29) are described by the following formulas: For the optimization of a and b parameters in the objective function, the following equation is taken into account: Under the assumption that distribution of regularization coefficients a and b is uniform, maximal values of the probability P(a,b|D,A) are obtained for the biggest values of the element P(D|a,b,A). Probability P(D|A) is independent of the required parameters. After suitable transformations [20,21], equations describing a and b parameters for the minimum of the objective function are obtained: where and w MP -minimum point of the objective function, Hhessian matrix of the cost function. The parameter c means an effective number of parameters of the NN; however, W is a number of all parameters in the NN.

Optimal Brain Damage method
The neural networks training leads to the minimization of the cost function defined as a mean square error between estimated and real value.
The cost function, for p-elements learning vector, is described in the following way: Differentiability and continuity of the cost function (39) make possible to use the gradient methods for its minimization. The first step in this method is an expansion of the cost function into Taylor series around the actual solution: where Dw i -changes of i-th weight; In the OBD algorithm, the weight coefficients are eliminated after full training of the net, so we can assume that elements related to the gradient are equal zero and skip them in Eq. (40). The hessian matrix is diagonally dominant, which makes it possible to include only diagonal elements h ii of this matrix in the presented algorithm. The quadratic approximation assumes that the cost function is quadratic, so the third element in the Eq. (40) can be neglected. Following the above assumptions, the saliency coefficient is described by the following relation [15]: These coefficients give the information about influence of the respective connections in NN on the training process. The weights with the smallest saliency parameter are eliminated. The algorithm of OBD method is thus presented as follows: 1. Choice of reasonable topology of neural network. 2. Full training of the net. 3. Computing diagonal elements of the h ii . 4. Evaluation of the saliency parameters S i for every weights coefficients. 5. Deleting the elements with the smallest saliency. 6. If weights connections were deleted, go back to the second point with reduced topology of neural network.

Simulation results
The NN estimators are tested in the control structure presented in Fig. 2. The main parameters of the drive system are as follows: T 1 = T 2 = 203 ms and T c = 2.6 ms. The assumed values of resonant frequency and the damping factor of the speed closed loop of the drive system are, respectively: x o = 45 s -1 and n r = 0.7. For disturbance reduction, which are caused by high dynamics of inputs signals and measurement noise, the low-pass filters are used with time constant T = 5 ms. The first results are presented for NNs trained with the Levenberg-Marquardt algorithm, without any additional techniques. The neural estimators are tested first in the open control loop, which means that control structure of the two-mass system is based on state variables obtained directly from the drive mathematical model, and signals estimated by the designed NNs are not used in this structure. The obtained results are shown in Fig. 3.
In order to evaluate the quality of estimation of the load machine speed x 2e and shaft torque m se , the estimation errors of NNs are calculated, using the following formula: where x i -real value,x i -estimated value, N-number of samples.
The estimation errors (average error per sample) calculated for transients presented in Fig. 3, are, respectively, 5.77 for the load speed and 0.63 for the shaft torque.
Next, the estimated signals were introduced into the control structure and obtained results are demonstrated in Fig. 4. As can be seen from those transients, neural estimators prepared using the Leveneberg-Marquardt algorithm and next tested in the closed control loop failed.
The NNs are not considered during the designing process of the control structure. The coefficient values (20)(21)(22)(23) in the suitable feedback loops are calculated with the assumption that we have the exact knowledge of feedback signals, so in the case of estimated variables, these ''ideal'' coefficients intensify dynamical estimation errors and thus  Fig. 4 Transients of the real and estimated load speed x 2 (a) and torsional torque m s (b), and their estimation errors obtained from NNs trained with LM algorithm and tested in the closed-loop system quite significant interferences appear. Oscillations of state variables are excited in the closed-loop structure, so the proper control is impossible. These phenomena may cause damages of coupling elements between the motor and load machine. So the high quality of state variables estimation is necessary for the correct operation of the closed-loop drive system. Thus, the optimization methods for NNs are introduced.
First, the application of Bayesian regularization in neural estimators is tested, and operation of the obtained NNs implemented in the closed-loop system is presented in Fig. 5, also in the case of changeable load side time constant T 2 .
The obtained results are very good, and the usage of the modified cost function (26-28) during the training procedure of NN can eliminate too big weight coefficients of the designed neural estimators and thus prevent oscillations appearing previously in the closed-loop operation. The correct operation of the designed estimators in the closed control loop can be assured even for changeable values of the load mechanism time constant T 2 (Fig. 5c-f).
However, the best quality of the state estimation is achieved for NN structures optimized with the OBD method. The obtained results of closed-loop operation are demonstrated in Fig. 6.  Fig. 7. The Hinton diagrams visualize matrices of bias and weights values. Each value is represented by a rectangle, which size is associated with the weight magnitude, and each color indicates the sign (a positive-red, a negative-green). The OBD algorithm eliminates individual inter-neural connections; however, there are neurons completely eliminated after this optimization process (Fig. 7d, e).
In the Table 1, the comparison of the estimation errors, calculated according to (44) for both tested methods, is shown. Both described methods enable the preparation of neural estimators which give the correct results after implementation in the closed-loop structure of the two-mass drive system.
The OBD method can give better results, but for this method, higher computational power is required and the Fig. 6 Transients of the real and estimated load speed x 2 (a, c, e) and torsional torque m s (b, d, f) and their estimation errors obtained from NNs trained error with LM algorithm and the OBD method, tested in closed loop for different values of T 2 time constant training process is much slower than for the Bayesian regularization method. The pruning methods are also important when the hardware implementation of the neural estimators is considered. It is possible to reduce the structure of the NN which leads to the simplification of the realization algorithm and in result to save the hardware resources (e.g., in the case of FPGA implementation).

Experimental results
The tested drive system with an elastic joint is emulated with two DC machines (0.5 kW each) connected by an elastic shaft (a steel shaft of 5 mm diameter and 600 mm length). The stiffness of the connection depends on the shaft diameter. The motor is fed by a power converter. The control algorithm and neural state estimators are implemented in DSP placed in the dSPACE 1102 card. The load machine in the drive system is also controlled using the DSP (see Fig. 8). Basic parameters of the drive system are presented in the Table 2.  7 Hinton diagrams for the load speed estimator illustrating weight values between input and first hidden layer (a, d), between two hidden layers (b, e) and between second hidden and output layers (c, f) before OBD method (a-c) and after weight elimination (d-f)  The speeds of both DC machines are measured by incremental encoders (36 000 pulses per rotation); however, the measurement of the loading machine is used only for comparison with the estimated value. In the laboratory setup the LEM sensors for current measurements are implemented. There is no shaft torque sensor in the laboratory setup. Therefore, in order to check the estimated shaft torque shape, the Kalman filter is applied [4,22]. In Fig. 9, pictures of the laboratory test bench are presented.
Exemplary transients of the state estimation in the closed-loop drive structure, obtained before and after NNs optimization, are presented in the Fig. 10.
The tests are realized for the reference speed that equals 20 % of its nominal value; after one second, the reverse operation of the drive system is forced. In the period t [ (0.5-1.5)s, the nominal load torque is applied.
Next, this load is taken off. Similarly to the simulation, neural estimators trained with Levenberg-Marquardt algorithm are generating noises and instability of the closed-loop drive system operation (Fig. 10a, b). After optimization with the described algorithms, both estimators work properly. The estimation errors (44) for both optimization algorithms are similar in the case of experimental tests: for the NN optimized with Bayesian regularization, the load speed error is 0.37 and the torsional torque error is 2.77. The inaccuracy of the load speed and torsional torque reconstruction for neural estimator designed with the OBD method are, respectively, 0.49 and 2.99. Additional tests for twice bigger T 2 time constant were conducted. The results are presented in the Fig. 11.
As in simulation tests, the changes of the two-mass system parameter are not taken into account during neural estimator training, also coefficients in the control structure are calculated for nominal value of T 2 . Estimation error of the load speed is equal 0.67 in the case of the Bayesian regularization method and 0.83 for the OBD method. Torsional torque reconstruction using neural estimators optimized with Bayesian regularization method presents error equal to 2.76 and after implementation of the OBD method is 2.95. Comparing changes of the errors for different values of T 2 in the drive system similar values can be observed. The conclusion is that obtained neural estimators are robust against changes of the tested drive parameter.

Conclusion
Application of neural estimators in the drive system with elastic coupling enables very good estimation quality. Neural estimators do not require the knowledge of mathematical model parameters on the contrary to the algorithmic methods of state variable estimation. Disturbances from the estimated signals connected as additional feedbacks in the two-mass drive control structure can lead to the speed oscillation or even problems with system instability. So the good quality of the estimation is very important. After the implementation of Bayesian regularization or OBD, the obtained precision of calculations in NN is much better. Presented estimators are also robust to changes of the mechanical parameters of the drive, like the load side time constant T 2 . The OBD method can give slightly better results, but it should be noticed that for this method higher computational power is required and the training process is much slower than for the Bayesian regularization method. Thus, this last method can be recommended for practical implementation. Correct work of the designed estimators was confirmed not only by simulations but also by experiments in the real drive system in the laboratory.  regularization (a, b), and the OBD method (c, d)