Rhythmic-Reflex Hybrid Adaptive Walking Control of Biped Robot

For the central pattern generation inspired biped walking control algorithm, it is hard to coordinate all the degrees of freedom of a robot by regulating the parameters of a neutral network to achieve stable and adaptive walking. In this work, a hybrid rhythmic–reflex control method is presented, which can realize stable and adaptive biped walking. By integrating zero moment position information, the walking stability can be improved on flat terrain. The robot’s body attitude information is used to modulate the control system in real-time to realize sloped terrain adaptive walking. A staged parameter evolution process is used to derive the parameters. Through the entrainment of the oscillatory network and the feedback information, the real-time joint control signals can be regulated to realize adaptive walking. The presented control strategy has been verified by using a biped robot restricted in sagittal plane and the experiments reveal that the robot can successfully achieve changing sloped terrain adaptive walking.


Introduction
The biped robot have been getting lots of attention due to its great adaptability to different walking environments, excellent obstacle avoidance ability and extremely good applicability in various fields, like military, space exploration and ocean detection.Meanwhile, research on biology shows that human's rhythmic movement is the joint result of inherent patterns and reflex behavior [1,2].Inspired by the CPG, a locomotion control strategy has been proven to be workable [3][4][5][6].The rhythm signals can be produced endogenously by CPG, a kind of neural network.Combining the inhibition of neurons and entrainment with feedback information, the motion patterns of oscillatory network can be modulated.Thus far, this biological inspired motion control method has been successfully used in swimming and crawling robots [7][8][9][10] and legged robots [11][12][13][14].Inspired by Taga's work [15,16], this method is also increasingly used for the biped walking control [17][18][19][20][21]. J. Or [17] developed a hybrid control system on the basis of CPG and Zero Moment Position (ZMP) information for a biped robot.The CPG controller based on ZMP can regulate the flexible spine posture, which makes the walking balance maintained.This approach is equivalent to the combination of ZMP based offline trajectory planning and online balance compensation.He et al. [21] applied a set of cosine function to generate the joint space control signals for a robot.The ZMP information gets used to ensure the walking stability by adjusting the activity of cosine function.Fu et al. [22] presented a biped walking control method by combining the CPG and passive walking.Oscillators were used to control the hip joint; the triggering and ceasing of the oscillators during a walking cycle can be adjusted by feedback information.Wang et al. [23][24][25] presented a CPG inspired control algorithm which can achieve disturbance rejection of a biped robot composed of seven links.The control system controlled the joint torque and joint stiffness independently.The coordination among limb motions and feedbacks was introduced to the CPG model.The biped robot can achieve periodic walking with high stability at several different speeds.
The behavioral adaptability to different environments of the CPG-inspired control methods is due to the motion pattern adjusted by the sensory feedback.Some previous works of integrating feedback to the oscillator control Osc.2 3 q 1des q 6des q 1 q 2 q 3 q 5 q 4 q 6 q Fig. 1 Structure of the whole control system system can improve the environmental walking adaptability [26][27][28][29].In our work, the biological reflex is imitated to modulate CPG system to adapt the biped robot to various conditions without varying model parameters except the feedback to the oscillator model.Then, the body attitude angle is used to imitate the bio-mechanism, vestibule spinal reflex, to modulate the oscillator signals automatically and the ZMP-based feedback loop is also added to improve the stability of biped walking.The advantage of this proposed method is that it can realize adaptive and stable biped robot walking pattern in real-time without requiring prior terrain information, or relying on range sensor information for surface topology measurement.To validate the presented control algorithm, simulations are conducted for generations of stable and adaptive walking patterns on flat and sloped terrains.
The rest of this paper is organized as follows: Section 2 describes the architecture of the presented rhythmic-reflex hybrid control system, Section 3 presents the model of the seven-link robot, the details of the proposed algorithm are described in Sections 4 and 5, Section 6 verifies the real-time performance and validity of the presented control system algorithm by simulations and Section 7 concludes this paper.

The Control Architecture
Many biped robots successfully utilize ZMP based locomotion control methods [30,31].The trajectory-based methods are good for a robot to walk according to pre-designed trajectories while maintaining its balance.However, pre-designed trajectories are fixed and therefore, if terrain conditions change, pre-designed trajectories may fail.In this work, a ZMP-CPG hybrid adaptive biped walking control method is presented.The overview of the proposed hybrid control system is shown in Fig. 1.It contains a joint trajectory generator and two reflex feedback loops.The CPG network can generate rhythm joint trajectories with given initial parameters, and the robot is controlled to track the ideal trajectories by a PD controller.The ZMP and body attitude information is used as feedbacks to modulate the CPG's output signals to generate a stable and adaptive walking pattern.
A robot with two legs and an upper body is used to verify the presented control method (Fig. 2).The specific details of robot model information will be given in the next section.The selection of appropriate parameters plays a critical role in achieving desired locomotor patterns.The trial and error and the evolutionary algorithms (EA)-based methods [14,32,33] are usually used for parameters selection.In this 2 The biped model in different spaces work, a staged evolution is applied to search the system parameters.At the first stage, the numerical simulation is conducted to study the effect of every parameter on the output signals so that the preliminary parameters of the evolution process can be set.Secondly, Non-dominated Sorting Genetic Algorithm-II (NSGA-II) based evolution method is used to evolve parameters to realize flat terrain walking.Finally, during the slope terrain walking gaits generation, by entraining several loops of sensory feedback, adaptive walking patterns are achieved.

Seven-Link Robot Model
The biped model diagram is displayed in Fig. 2, in which each leg is comprised of three links.It is assumed that the distribution of each rod's mass is even.As given in Eq. 1, the relationship between the absolute angle (θ i ) and relative angle (q i ) can be derived by geometric relations.
A complete walking period of the biped robot consists of a double-leg support phase, an impact phase, and a swing phase.The robot dynamic model changes before and after the swing foot touches with the ground.Therefore, the model of biped robot is a hybrid dynamic system.In this work, the double-support phase is assumed as an instantaneous and fully inelastic impact.Therefore, the walking process is completed by alternating between the swing phase and impact phase.The physical parameters of the robot are listed in Table 1.

Swing Phase Model
With the common assumptions of the planar robot [34] and Lagrange equation, the robot model in swing phase can be described as below: where θ = [θ 1 , θ 2 ...θ n ] T is the angle vector as defined in Fig. 2 and τ = [τ 1 , τ 2 ...τ n ] T is the vector of driving torque.The matrices M(θ) and C(θ, θ) denote the inertia and the centrifugal force with coriolis force terms, while G(θ) and Q represent the gravity terms and the input matrix, respectively.For the seven-link robot presented here, n refers to the number of Degree of Freedom (DoF) of robot, which is equal to 6.By letting x = (θ, θ), Eq. 2 can be expressed in the form of state space as follows:

Impact Phase Model
In this phase, assumptions are made as follows: (1) The impact phase is instantaneous; (2) A collision can only cause joint angular velocity to change instantaneously but not the joint angle; (3) Because the joint driver cannot produce impact force, it can be ignored; (4) There is neither sliding nor rebounding during the contacts between the swing foot and ground; (5) There is no force between the ground and the support foot which is going to lift up the support foot at the end of the impact.The geometric constraint and the equation of the contact is as below: Equation 4 serves as a condition to determine whether a ground contact has occurred.The following Eq. 5 can be recorded as θ + = W θ − , where θ − and θ + denote the state variables before and after the swing foot touches with the ground.Equation 5 can be obtained by comparing Osc.6 Osc.2 Osc.5 Fig. 4 The network topology the robot's positions and orientations before and after the collision.
Due to the fact that the roles of two legs should be switched in impact phase, a lateral DoF and a vertical DoF are added at the former support foot to handle the ground constraints on the support foot and record the current stride and the position of the new support foot.Therefore the robot has n+2 DoFs at impact phase, q e = [ q 1 q 2 q 3 q 4 q 5 q 6 p stx p sty ] T denotes the state variables of impact phase, where p stx and p sty represent the Cartesian coordinate of the Center of Mass (CoM) of the support leg.The robot model in impact phase can be obtained according to Lagrange equation, as follows: M e (θ e ) θe + C e (θ e , θe ) θe + G e (θ e ) = Q e τ + J (θ e ) T F x F y (6)

Fig. 6 ZMP regulation flow chart
Where F x and F y are the reaction forces of the ground, J (θ e ) is the Jacobian matrix.According to assumption (4), where Fig. 5 The oscillation signals Fig. 7 The schematic of the gradient estimation method Variable x + can be expressed in terms of x − as where x − = (θ −T , θ −T ) T and x + = (θ +T , θ +T ) T denote the state variables before and after the swing foot touches with the ground, respectively, and

Hybrid Model of Robot
The robot model in two different phases were established in the first two subsections.As a result, the expression of the integral robot dynamic model can be denoted as where c = {θ |c(θ) = 0} is the geometric constraint in which x = (θ, θ).The condition c(θ) = 0 is described in Eq. 4.

Oscillator Model
In this work, a modified phase oscillator (Kuramoto model [35]) is applied as the neural oscillator model as below: where the subscript i represents the i-th oscillator and φ i denotes the phase of the output i , v i represents the frequency, λ ij is the coupling term and ψ ij is the desired phase shift between oscillators i and j , G r is a positive constant determining the speed at which r i converges to given R i , and G d is the intensity at which d i converges to a given parameter D i .The superimposing of d i can make output i no longer a zero-axial symmetric oscillatory signal.
As shown in Fig. 3, v i and R i determine the frequency and amplitude of the oscillator, respectively.λ ij and ψ ij determine the connections among oscillators, and ultimately the topology of the oscillator network.By setting these model parameters, the desired oscillation can consequently generated.

The CPG Network Topology
As shown in Fig. 4, a CPG network topology is built for the biped robot in view of its kinematic characteristics.The network's output signals and the relationships among them are displayed in Fig. 5.The corresponding coupling matrix is expressed by Eq. 14, which can be obtained by referring to human walking mode.

CPG-ZMP Hybrid Control
A hybrid algorithm integrating ZMP information as feedback to achieve a stable walking pattern is presented.The oscillation signals are set as the ideal joint trajectories.Control torque τ i (q, q) can be generated by a PD controller on the basis of the difference between ideal and actual joint angles.
where q ides expresses the ideal angle value of joint, and qides is the angular velocity.Joint angles are regulated to track the ideal joint trajectories.
For the biped robot restricted in sagittal plane, the ZMP [36] in x-axis direction can be calculated according to the following equation (the inertia moment of the centroid of each rod is neglected): where (x cord i , y cord i ) is the Cartesian coordinate of the centroid and m i is the mass, where n = 7 represents seven links of the robot.g is the gravity acceleration.
Real-time ZMP information serves as feedback to optimize the walking patterns by adjusting the frequency and amplitude of the signals generated by oscillations.Figure 6 is the flow chart of ZMP regulation.In Fig. 6, C v and C R are the coefficients of v i and R i respectively, and C v0 with C R0 denotes the initial values, the ranges of whom are Therefore the modulation of the frequency ν i and amplitude R i of the CPG can be achieved by varying parameters C v and C R .

CPG-Body Attitude Reflex Hybrid Control
The position and orientation of the robot are related to the walking terrains, as presented in Fig. 7. Combined with the robot position, the slope angle of ground α can be detected according to Eq. 17: where (x swingFoot , y swingFoot ) and (x stanceFoot , y stanceFoot ) are the coordinates of support foot and swing foot in the reference coordinate system.The feedback loop coupled with the CPG system is expressed as where K is the gain coefficient and off set is the offset.
Owing to the dynamic characteristics of CPGs, the cyclic period, the amplitude and the proportion of the swing and impact phases, even phase relationships can be easily modulated online.Thus, the adaptive biped walking patterns can be generated.

Staged System Parameters Tuning
In this work, a staged parameters search process is applied.Firstly, numerical experiments are conducted to study the effects of model parameters on the output signals.
Therefore, the preliminary parameters can be set much easier.Secondly, NSGA-based evolution method is used to evolve control parameters to realize flat terrain walking.Thirdly, a GA-based approach is applied to evolve the control system to realize slope terrain walking, which is on the basis of the flat terrain gait pattern evolution.Finally, by entraining several paths of sensory feedback, adaptive gait patterns walking on various inclined slope terrains can be realized.
NSGA is employed to complete the gait pattern evolution for the entire control system [37].Genomes are coded as array of neural oscillator model parameters.The distance fitness measure takes into account the straight-line walking distance along a pre-specified direction (the xdirection) the robot reaches in a fixed time.In this stage, the parameter evolution process is treated as a minimization problem, which is described as follows: where x position 0 represents the robot's initial position, and x position end represents the position at the end of simulation.
The second fitness measure guarantees the walking stability.The ZMP is supposed to be kept in support region of the feet (or foot) during walking process.The stability margin D s is defined as stable index while S x and S y are defined as the x and y positions of the sole's edges.Therefore, the stability margin in x and y directions can be expressed as D sx = ZMP x − S x and D sy = ZMP y − S y , respectively.The larger the stable index D s are, the higher the stability is.Therefore, the stable index D s is chosen as the second fitness measure.
The third stage utilizes GA to evolve the control system based on the previous evolution parameters to generate slope terrain walking pattern.The feedback term is set to zero during the parameter evolution process.The measure of distance fitness takes into account the straight-line walking distance along the slope terrain.Finally, by entraining several paths of sensory feedback, adaptive walking patterns can be realized.For example, for an inclined terrain, the body attitude can reflect the walking performance, so the body attitude information is used to modulate the trajectory generators.The feedback coefficients are obtained through simulations.During the exploration of the feedback coefficients, other parameters, including the model parameters and mapping parameters, are kept constant.

CPG-ZMP Hybrid Algorithm for Stable Walking
To verify the presented algorithm, several experiments were performed in Matlab, and the number of walking steps of robot was fixed to 20.The control parameters were tuned by combining evolution methods and simulations.Based on the approximate ranges of the model parameters which can produce stable oscillator signals, NSGAII was used for optimizing the whole system to generate biped robot motion patterns.Commonly, the advised value of the crossover probability is relatively large, ranging from 0.1 to 0.5, to ensure enough crossover rate.Side effects would be induced if the crossover probability is out of the advised range.Therefore, the crossover probability for the interpolation crossover is set as 0.2 in this work.Mutation allows the random variation of individuals in the searching space, which effectively refrains the evolutionary algorithms from trapping in possible local optima.The mutation probability has an advised range of 0.01-0.5.In the following experiments, the mutation probability is set as 0.1.The generation size is set as 200 and the population size is selected as 50 individuals.After about 100 generations, a stably straight gait pattern can be achieved.The physical parameters of the robot and initial conditions of CPG are listed in Tables 1 and 2 The evolution result of the 200th generation is shown in Fig. 8, and a tradeoff relationship between the two objectives is evident.One of the best solutions (the chosen solution has been marked in Fig. 8) is picked on the Pareto front, and the corresponding parameters are set as given in Tables 3 and 4. PD parameters are listed in Table 5. C v0 = 2 and C R0 = 1 are the initial values of the coefficients C v and C R , respectively.

Simulation 1
The validity of ZMP feedback is confirmed by comparing the control effects with or without ZMP as feedback.The experimental results are displayed in Table 6.Regardless of whether the ZMP feedback is utilized or not, stable walking can be realized as the frequency and amplitude are both small.Nevertheless, the robot can easily lose its stability without feedback if the frequency or amplitude increases to a relatively large value.
The absolute joint trajectories of the robot are shown in Fig. 9. Obviously, the joint trajectories will be reset at the start of every step in accordance with the ground-contact information.Due to the CPG parameters regulated by ZMP feedback in real-time, the variation trends of C v and C R are shown in Fig. 10.The validity of real-time ZMP modulation is testified by the periodic curves of C v and C R displayed in Fig. 10, and the modulation can achieve a stable state in 3 steps or 1 second as shown in Fig. 10.
The variation trend of real-time ZMP position is presented in Fig. 11.Combining the Eq.16 with robot model parameters, the ranges of ZMP position in different phases are shown in the following, in swing phase, x zmp ∈ [−0.1, 0.1], and x zmp ∈ [−(stride + 0.1), 0.1] in impact phase, where stride is the current stride.Figure 12 presents the height between the swing foot and level ground according to the ground collision detection described in Eq. 4.
The two stick figures, Fig. 13 with feedback while Fig. 14 without feedback, have the same initial conditions.The contrast between them demonstrates the validity of the ZMP feedback when the amplitude or frequency is a relatively large.In other words, ZMP information can be used as feedback to improve the stability of the walking robot even if the initial conditions are not suitable.
In Fig. 15, the PD controller adjusts the generated torque signals according to the difference of the CPG signal, shown in Fig. 16, and the corresponding actual joint value, shown in Fig. 9.The torque signals in Fig. 15 are then used to drive the motors to execute the corresponding movement and produce the desired walking pattern.The UCPG signals are periodic as given in Fig. 16, in which the amplitude and frequency are in line with the parameters set.
The simulations verify that the ZMP based feedback loop can improve the stability of the control system, and can also make the entire system more adaptive by regulating the motion pattern online in accordance with the walking environment.

Simulation 1: Upward Slope Walking
(1) Parameter description At the beginning of the evolution, the biped walking can only maintain for a while and fitness is low.Gradually, the walking distance increases.Finally, it can keep walking stably till the end of the experiment (Fig. 17).After about 30 generations, it can generate a stably straight walking pattern.The parameters of the trajectory generator are listed in Tables 7 and 8 and D 4 = − 0.080.And the PD parameters for this simulation are set as in Table 9.
(2) The feedback loop The gain parameters are set independent of α as constants in the upward slope walking experiments as follows (21) where K v is the frequency coefficients for all UCPGs, K ri is the coefficient of the amplitude of the i-th UCPG, and K r2 is kept constant as 1 according to the assumption that the swing leg is assumed to be not bent. (

3) The simulation results
To testify the adaptability of the presented hybrid control method, three different slopes (3 • , 5 • and 8 • ) were set for the robot to traverse.Figure 18 shows the oscillatory signals of the CPG system, and Fig. 19 shows the absolute joint angle during the upslope walking.In the vicinity of 2s and 4s, changes have taken place in the amplitude and period of the CPG signals corresponding to the changing slope angles.Figure 20 shows the stick figure of biped walking.It is obvious that the step length increases and the frequency decreases with an increase in slope angle, which is conducive to ensuring the stability of the walker in the case of increasing slope angle.The degree at which the robot's torso leans forward increases with an increase in slope angle, which can regulate the CoM of the robot by the weight of the upper body to prevent the robot from falling over.It is confirmed by the experiments that the robot can walk on the slope with changing angles stably and adaptively.
Figure 21 shows the variation trends of the coefficients of CPG's frequency and amplitude.There is a certain delay between the moments of modulation and slope angle change, which is because the robot can detect the slope angle only after the swing foot touches the ground, but this step before the landing swing foot is generally located on a different slope.This also explains why, in the previous figures, there is a fluctuation of transition.The robot would keep walking as if on the previous gradient and step forward before detecting the new gradient.This process cause a fluctuation of transition and the new gradient can be obtained.Then, parameters can be adjusted according the new gradient to achieve stable and adaptive walking.
Figure 22 shows the phase plots of the joint angles.A fixed slope angle α = 12 • is selected to verify the stability of the system.
In Fig. 22, the red points represent the initial points, and each joint angle curve has a stable limit cycle phenomenon.α = 12 • is the maximum slope the linked robot can climb using the presented control method.By simulating and analyzing, the overturning happens when the slope angle is bigger than 6 • with initial states under the static condition.However, the reflex feedback loop can realize a larger slope walking by tuning CPG's parameters to modulate the walking speed and stride in real-time.When the slope angle is larger than 12 • , the slipping phenomenon occurs.

Simulation 2: Downward Slope Adaptive Walking
In accordance with the parameters of upward walking, the parameters of the trajectory generator can be obtained to realize downward slope adaptive walking.These parameters are listed in Tables 7, 10 and 11.Besides, the parameters D 3 = − 0.080 and D 4 = − 0.080 in this process.
For the downward slope walking, the feedback parameters are obtained by calculating the measured data, and the feedback loop is designed as follows: where K ri is the coefficient of amplitude of the i-th UCPG, and α used in the feedback is expressed in radians.
In the second set of simulations, the biped robot walks in an inclined terrain with varying slope angles (−5 • , −3 • and −1 • ). Figure 23 is the absolute joint angles on different slope angles.In contrast with the joint angle on the uphill, the transitions at moment 4s and 8s is smoother.That is because the slope angle is gradually decreased during downhill walking experiment, so the transition process is smoother.Relative to a slope with a fixed gradient, a gradually increasing gradient will aggravate the difficulty of climbing up, leading to the instability of the robot in the transition process to a great extent.Figure 24 shows the stick figure in the process of a downward slope, which indicates that the stride of the robot increases with an increase in slope angle.This regulation guarantees the stability of the robot system in downward slope walking.As the figure shows, the robot's torso is slightly tilted back in the downward slope walking, and this state is to modulate the robot centroid position by the weight of the trunk, which plays a significant role in promoting the stability of the robot system.
Figure 25 demonstrates the effect of the feedback modulation in the downward slope walking process.The slope angle is detected as feedback only to regulate the amplitude coefficient of the CPG.As Fig. 25 shows, with respect to the time of the slope angle change, there is a certain delay under the feedback regulation.This is because the robot can detect the slope angle only when the swing foot touches the ground and then uses the detected angle to regulate the amplitudes of CPG signals.
Figure 26 is the phase plot of the absolute joint angle when α = −5 • .The trajectory starts from the initial point and converges to a limit cycle stably, which proves the stability of the robot system when α = −5 • , which is the minimum slope the linked robot can walk on.By simulating, the robot swings too much during the downslope walking, which means that it's easy to lose stability.Then the reflex feedback can be used to adapt the relevant parameters to keep the CoM within the safe area by modulating the step length and speed.However, when the downslope gets steeper, the robot swings too much to fall down.

Conclusion
In this work, a new online biped locomotion pattern generation strategy on the basis of a rhythmic-reflex hybrid control method is presented.A planar seven-link biped robot is applied to verify the effectiveness.Phase oscillators are applied to construct the CPG network, then generating rhythmic signals for the robot tracking.During the walking process, a ZMP regulation is designed to adjust the activity of the CPG system, which stabilizes the biped walking when the robot is on falling edge and makes the locomotion converge to a limit cycle stably.For sloped terrain walking, a biological reflex named vestibular reflex is imitated to adjust the control signals autonomously without changing the model parameters except the feedback gain.A staged parameters tuning method which combines numerical simulation and EA-based method is used to evolve the parameters of the whole control system.The bio-inspired algorithm proposed in this work can generate a more natural walking pattern with CPG in real-time and reduce the model error sensitivity by using ZMP feedback and reflex feedback.
The presented CPG-reflex algorithm is also applicable to biped robot walking in three dimensional (3D) space.In this case, a 3D model of robot is needed, a new neutral network must be rebuilt accordingly, and the ZMP feedback can be utilized by calculating the real-time ZMP position in 3D space.In this work, a phase oscillator model is used to imitate the biological CPG model.The dynamic model and the topology design of the CPG also have effects on the control results.So in the future work, a more artificial robot model will be designed to embody the naturalness of human walking.Besides, applying the presented algorithm to a real biped robot is also an important work for us to endeavor.Moreover, in this work, the walking level is restricted to sagittal plane.Nevertheless, irregular terrains and antidisturbance ability are important for robust biped walking.
Exploring how to maintain human-like active balance and robust walking for a humanoid robot will be studied in future work.

Fig. 3
Fig. 3 Change in the frequency, amplitude and offset of the oscillator

Table 1
Physical parameters of the robot Length Value [m] Mass Value [kg] Inertia Value [kgm 2 ]

Table 2
Parts

Table 5
PD parameters of flat walking

Table 6
Simulation results of experiment 1

Table 7
The