Introduction

Industrial robots are devices used by manufacturing companies to perform difficult, dangerous, or repetitive tasks. They increase productivity, reduce manufacturing costs, and guarantee a constant quality of the worked pieces (Singh et al., 2013). Robotic manipulators are made of a sequence of rigid segments, the links, connected by movable parts, the joints. Joints are actuated by electrical motors and include a set of gears that are designed to efficiently deliver power from the motor to the link. A position sensor, for the most cases an encoder, is always present on the motor since used in the position control loop.

When gears are not correctly positioned, or are worn out by use, the phenomenon known as backlash arises. Backlash causes the degradation of the performance of the robot, leading to vibrations and poor positioning accuracy, with a detrimental impact on the quality of the workpieces produced by the robot (Jaber & Bicker, 2016). Moreover, backlash increases over time eventually reaching the point where the joint breaks. Unexpected robot failures can lead to the interruption of the full production line where the robot is installed, with considerable economic losses for the manufacturing company. It is therefore crucial to estimate the backlash in joints, and plan maintenance interventions accordingly.

Typically, industrial robots maintenance follows a corrective or time-based preventive strategy (Lee & Scott, 2009; Shin & Jun, 2015). With Corrective Maintenance (CM) the maintenance intervention is performed to restore the functionality of the robot once a fault has happened. Differently, with Time-Based Maintenance (TBM) the intervention is carried out at fixed time intervals to prevent the failure. Unfortunately, with these simple strategies unwanted production stops and unnecessary component substitutions cannot be avoided. So, a third and more sophisticated strategy, the Condition-Based Maintenance (CBM), is preferred. CBM is based on a continuous monitoring of the physical asset to identify initial signs of faults and prevent the occurrence of failures. In this way unforeseen stops are much reduced, and maintenance interventions are performed only when necessary, with significant cost savings for the manufacturing companies. It is precisely in the context of CBM that the need for our research arises (Fig. 1).

Fig. 1
figure 1

Simple schematic diagram of a single-joint robot arm. The backlash phenomenon arises between the moving parts inside the joint

The idea behind this work is to develop a CBM tool that targets maintenance interventions in case of excessive gear play in robot joints, as discussed by Jaber and Bicker (2018). Such a tool must be automatic and easily applicable, even on the population of robots already installed at the customers’ production sites. Thus, it must be based on a method that does not require the installation of additional sensors on the robot. Unfortunately, this is a very difficult condition to comply with since standard industrial manipulators just have a position sensor for each joint, which is not enough to perform a direct measurement of the backlash. In almost all the approaches available in literature accelerometers, output position sensors, or torque cells are used together with the motor position sensor to get the measure of the backlash gap. This work, instead, presents a method to estimate backlash that does not use any other sensor except the motor encoder, which is always provided as standard equipment of a robot. By analyzing the motor position signal, a measurable characteristic representative of the backlash phenomenon (i.e., feature) is extracted. Through the observation of this characteristic an excessive gear play in the joint can be easily detected, and its increase over time can be monitored. Furthermore, knowing the function that binds the value of the feature with the value of the backlash in the joint, a reliable estimate of the value of the backlash can be obtained.

The key aspect of the proposed method is that the backlash effect can be highlighted by selecting appropriate test conditions, and also even faint traces of its presence can be detected as a disturbance on the motor speed signal. According to our studies, the disturbance signal induced by backlash in specific settings has a peculiar shape in the time domain that can therefore be taken as a signature for the backlash presence. Moreover, the feature that is representative of the amount of backlash in the joint is one of the parameters of the mathematical model that describes this signature.

A test bench with the joint of a real COMAU industrial robot was designed and assembled to validate the methodology. The bench was used both to provide data and to be the reference for the development of a Matlab/Simulink model of the system. Through the use of simulation, the backlash disturbance and its evolution over time were investigated. As a result, a mathematical model of the reference disturbance was constructed and key parameters of the model were identified. Finally, to provide an estimate of the backlash level in the joint, a meta-heuristic algorithm was used to scan the encoder signal and fit the signature to the data. Giovannitti et al. (2019) presented a feasibility study of the approach on a few simulated case studies in 2019, then a comparison of the performance of different meta-heuristic algorithms for backlash detection was published in 2021 (Giovannitti et al., 2021). In the present work, the analysis is extended considering new simulated data affected by noise and considering real-world data from robotic manipulators operating in a manufacturing plant.

The rest of the paper is organized as follows: the most common approaches to measure joint backlash are addressed in “Backlash and measurement strategies” section the proposed approach, the details about the test bench, the Simulink model, the system dynamics and the system perturbations associated with backlash are described in “Proposed approach” section the results of the method when tested on simulated data with noise and on real-world data are presented “Experimental evaluation” section finally, the conclusions are summarized in “Conclusions” section.

Backlash and measurement strategies

In mechanics, the backlash is a clearance caused by gaps between mating parts. In the specific case of coupled gears, it is the exceeding space between mating teeth (British-Standard-Institution, 2007) (Fig. 2).

Fig. 2
figure 2

The backlash gap between mating teeth of gears

A small amount of backlash is always present in geared devices. It is necessary to guarantee a smooth movement of parts and to prevent the teeth from wedging in (Mobley, 2001). The proper amount of backlash is carefully established by design (Japanese-Standards-Association, 1976; Japanese-Standards-Association, 2013). Specifically, the space between consecutive teeth has to be slightly wider than teeth thickness of the mating gear. This small clearance allows the teeth to easily slide on each other, and makes it possible for the lubricant to penetrate deeply between the parts. Unfortunately, the initial distance defined by project increases over time due to teeth wear, and most of the benefits provided by design are lost.

In robotic joints, a proper amount of backlash ensures an effective transmission of torque from the motor to the link. With these conditions, a good control of the link movement and good performances of the robotic arm can be obtained. However, when backlash increases, an additional space is created between gear teeth causing short interruptions in the transmission of torque. As a result a good control of the link movement is no longer guaranteed (Kubela et al., 2016). The interruptions happen especially during the reversals of motion, when the driving wheel reverses the rotation and changes the mating tooth on the driven wheel. In that while, the space between two consecutive teeth has to be traversed before teeth contact is restored (Fig. 3). When teeth are not in contact the load movement is free and no longer controlled by the motor, so a loss of precision in the movement of the robot can occur. Even worse, when the contact is recovered on the new tooth, an impact may develop and excite the mechanical structure of the joint giving rise to vibrations.

Fig. 3
figure 3

The backlash states on the reversal of movement. The mating teeth of two geared wheels are represented in the three phases that compose a reversal of movement. The first state is on the left: the mating teeth are in contact and the motion is transferred from the driving (lower) to the driven (upper) gear. In the second state, the motion of the driving gear is reversed: the contact with the driven gear is lost and remains interrupted while the driving tooth crosses the backlash gap. Finally, in the last state, the contact between the two gears is recovered and the driven gear starts moving in the reversed motion together with the driving gear

In mating gears the measure of backlash is obtained by calculating the travelling space of the driving gear during the reversal of movement. It can be measured as the relative displacement between the driving and the driven element. In the ideal case of zero backlash, or when the mating gears are in contact, this displacement is zero. While, in case of backlash presence in the system, in particular when the backlash gap is open, the two position are slightly different, and the maximum of their relative displacement gives the measure of the backlash gap ( Merzouki et al., 2003). Hence, the easiest way to quantify backlash is having two sensors, one for the driving and one for the driven gear, and calculating the difference between the two measured positions (Yamada et al., 2016).

The same reasoning made for a single pair of mating gears can be extended to a full transmission chain. In this case, the backlash of the transmission is the sum of the backlashes of the many components, and the two required sensors for the measurement should be located at the input and at the output of the transmission.

Traditionally, measurements of backlash in robotic joints are conducted manually, either by robot arm manufactures or by expert technicians, and the measurements are performed when the manipulator is at rest. The procedure involves braking the motor, while manually rotating the link at the same time. In this way, the input gear that is connected to the motor is held in place, while the output gears connected to the link are allowed to move back and forth inside the space between mating teeth. This play of the link represents the backlash of the joint and is measured with an externally mounted dial indicator. When performed in a production plant, this procedure requires skilled personnel intervention and the stop of the production for a considerable amount of time, with the consequences of high maintenance costs and loss of earnings for the manufacturing company. A further issue of this procedure is that backlash measurements performed with a robot at rest can be slightly different from measurements performed on the same robot while operating. Thermal factors can act on the shape of gear tooth, widening or narrowing it, changing the mating behaviour and thus altering the measure.

Considerable effort is spent in industrial research on devising dynamic, fast and possibly automated backlash measurement procedures. Many are the bibliographical references that can be found about the problem of measuring backlash in a robotic joint. The great majority of them are based on the use of a pair of sensors, one at the joint input (e.g., on the motor) and the second at the joint output (e.g., on the link). Those based on the use of only one sensor typically refer to the adoption of an accelerometer.

The first researchers to propose a method to measure backlash in robotic joints were Dagalakis and Myers (1985). Their method was based on the use of an accelerometer mounted on the driven link, and on the exploitation of the coherence function calculated between the link acceleration and the motor voltage. As the previous, also other researchers, like Jaber and Bicker (2016); Jaber and Bicker (2018) and Lima et al., (2009), made use of accelerometers on the robot link. Unlike the first researchers, they focused their work on the use of vibration analysis techniques (i.e., wavelet analysis) to calculate the value of the backlash. A completely different solution was the one proposed by Sarkar et al. (1997), who tried to measure backlash by analyzing impact signals collected with a torque sensor mounted on the driving part of the joint (i.e., on the motor). Also  Li et al., (2021) tried to detect the backlash induced tooth-tooth impacts within the transmission, but they used a gyroscope on the driven part of the joint (i.e., on the load). Other solutions use torque and position sensors (Hovland et al., 2002) to automatically estimate the backlash in robot transmissions; or two position sensors and a Kalman filter (Beinke et al., 1998; Lagerberg & Egardt, 2007) for an online identification of backlash.

Unfortunately, all these solutions rely on sensors that are not present on standard industrial robots (Zhang 2000). Thus, they require the installation of additional sensors on the joints, leading to increased costs for robot manufacturers. For this reason, the above solutions see little to no adoption in industrial contexts, where more easily accepted could be those measuring procedures based on the use of the already available on-board sensors. This usually means to consider solutions that relay on a single position sensor, mounted on the motor. With this setup, only a partial knowledge of the phenomena that are happening between the many elements in the robotic transmission can be achieved, making the problem of estimating backlash more difficult to address. Very few bibliographical references can be found in literature about this particular topic; a short summary can be found in the work of Yang et al., (2012).

Going into details, two different methods of backlash identification using only the motor-side position sensor were presented by Gebler and Holtz (1998). In the first method only the motor position signal was used for the identification, since the load position was supposed stationary. To meet this condition, very small motor torque impulses were considered. While the backlash gap was open and the load was disengaged with the motor, such impulses were sufficient to move the driving part through the gap. Instead, once the backlash gap was traversed and the contact with the load was restored, the small impulses were not sufficient to start moving the load too. So, the load always remained still, and the full displacement of the motor during the gap crossing provided the amplitude of the backlash. Unfortunately the method, designed on a simple electromechanical system, cannot be easily applied to robotic joints where intense static friction phenomena make it very difficult to work with small torque values. The second method proposed by Gebler used the motor current signal in addition to the motor encoder signal. The time derivative of the motor current was inspected to identify the peaks indicating the instants of decoupling and engagement of the gears. Then the motor displacement in the time interval defined by the two peaks was used as a measure for the backlash. The approach needs smooth current signals to correctly detect the small peaks related to the decoupling/engagement condition, so an ad-hoc filtering has to be designed. Since filtering should be changed in accordance with the specific testing situations, the method is difficult to generalize and therefore impractical in industrial use.

In a similar way, Márton and Lantos (2009) exploited the analysis of speed and current signals from the motor to estimate backlash and friction in the joint. In their experiments they used a SCARA robot. In the method the two meaningful time instants when the motor decouples (\(t_1\)) and engages (\(t_2\)) the load were identified through a joint analysis of the speed and current signals from the motor. Then the measure of the backlash gap was obtained by integrating the velocity on the [\(t_1\); \(t_2\)] time interval. To capture these moments, specific test conditions were defined for the arm movement. Unfortunately such conditions, easily met in a SCARA robot, can hardly be reproduced in a standard articulated manipulator.

As the previous, Villwock and Pacas (2008) presented a method to measure the magnitude of backlash by using the motor velocity as the only measurement. Again, the backlash gap was measured by integrating the velocity signal on the time interval corresponding to the instants of decoupling and engagement of the mechanics, but in this case such instants were identified without the use of the current signal. A triangular test function was used to excite the system and drive the gears thought the backlash. Then, the identification of the two reference instants was performed by looking at specific characteristics of the resulting motor speed signal.

Lastly, Stein and Wang (1995, 1996, 1998) proposed the idea to identify and measure backlash by detecting fast changes in the speed of the driving gear of the transmission. The changes were caused by impacts between the gears teeth, arising when the backlash looseness was too high. The same authors also showed that the magnitude of impacts was proportional to the gear backlash. An open-loop, simple electromechanical system was considered for design and testing phases, and sinusoidal input voltage was used to stimulate the system.

The method proposed in the present paper is a generalization of the above solution. It extends the applicability of the solution to closed-loop systems, and to systems with a more complex mechanical structure. It also simplifies the test conditions by eliminating the need for a sinusoidal input signal. Furthermore, unlike Stein’s method and unlike all methods found in literature, the backlash is not measured during a reversal of movement but during a continuous movement of the link. Under this condition, the effects of gravity on the link act on the system opening the backlash gap. Finally, the proposed method relies on a fully automated analysis procedure, based on a meta-heuristic algorithm. For all the reasons cited above the presented method is highly convenient for industrial contexts.

Proposed approach

A robotic joint is a mechanical system with elastic properties; when properly excited, it exhibits oscillations. Excessive space between mating parts in moving gears can generate impacts, and such impacts, in turn, can excite oscillations. By creating proper test conditions, these oscillations can be observed on the motor speed signal and used as a reference to estimate the backlash in the joint. Backlash-induced oscillations show a characteristic appearance that makes them recognizable among the possible types of disturbances affecting the encoder signal. These oscillations can be represented by a mathematical model, and considered as the characteristic signature of the presence of the backlash in the transmission. When the signature is detected on the motor speed signal, an excessive level of backlash will be present in the robotic joint. Moreover, since the amplitude of the oscillation is directly related to the amount of the backlash in the transmission, the disturbance amplitude can be used as a reference to obtain an estimate of the backlash value.

With this in mind, the proposed procedure to measure backlash in the joint consist of (i) making the robotic joint perform some test movements while recording the motor speed signal, (ii) fitting the backlash signature model on the motor speed signal, and (iii) using the resulting values for the model parameters to estimate the backlash, \(\delta \), in the joint. The workflow is reported in Fig. 4.

Fig. 4
figure 4

Steps of the proposed procedure

Backlash signature and its mathematical model

In a properly excited robotic joint affected by backlash, a disturbance can be observed on the speed signal sensed by the motor encoder. The disturbance takes the shape of a sequence of many damped oscillations superimposed onto the motor speed signal, see Fig. 5 for an example.

Fig. 5
figure 5

Motor speed signal and the superimposed oscillations due to an excessive value of the backlash in the joint

To excite this behaviour, a test was performed by keeping the rotating axis of the joint horizontal and by running the motor at constant speed. Unlike what is usually done when measuring backlash, no inversions of the motion were considered. But the continuous motion of the motor was studied. Under these conditions, it was possible to disregard phenomena like friction, and consider gravity as the only force acting on the link. A damped oscillation was noticed every time the contact between motor and load was restored after a backlash gap opening. At the moment of contact an impact occurs and generates a dumped oscillation. The peculiar shape of such backlash-induced oscillation was described by the mathematical model:

$$\begin{aligned} d_b(t) = {\left\{ \begin{array}{ll} 0 &{} t< t_1 \\ A\,e^{-(t-t_1)\tau }\sin {\omega (t)} &{} t_1\le t\le t_2 \\ 0 &{} t> t_2 \\ \end{array}\right. } \end{aligned}$$
(1)

where \(t_1\) is the starting time of the oscillation, A is an amplitude factor, \(\tau \) is a damping factor, and \(t_2\) is the ending time of the disturbance. The disturbance signal is composed by a sequence of many of these oscillations. Within this sequence all the oscillations have the same amplitude, but with alternated signs (positive and negative). This is due to the particular test movement performed to excite the mechanical structure. It generates two impacts per load turn: one at the beginning of the descending phase of the load movement, one at the beginning of its ascending phase. At each of these moments the backlash gap opens and then closes. So, two oscillations per load turn are generated. They can be described with the time limited function f(t):

$$\begin{aligned} f(t) = d_b(t)-d_b(t-t_d) \end{aligned}$$
(2)

where \(t_d\) is the starting point of the second oscillation.

Since several replica of this function can be seen within a temporal window of observation, the overall disturbance model h(t) is defined as the sum of n shifted replica of f(t):

$$\begin{aligned} h(t) = \sum _{i=1}^{n} f(t-(i-1)t_f) \end{aligned}$$
(3)

where \(t_f\) is the time shift corresponding to a \(2\pi \mathrm{rad}\) rotation of the load.

The model can be further trimmed by choosing a reference starting point for the disturbance, \(t_0\), and defining the meaningful time periods as intervals referred to \(t_0\),

$$\begin{aligned} t_1=t_0+T_1, t_2=t_0+T_2, t_d=t_0+T_d, t_f=t_0+T_f \end{aligned}$$
(4)

Equation 1 then changes to

$$\begin{aligned}&d_b(t-t_0) \nonumber \\&\quad = {\left\{ \begin{array}{ll} 0 &{} t< t_0+T_1 \\ A\,e^{-(t-(t_0+T_1))\tau }\sin {\omega (t-t_0)} &{} t_0+T_1\le t\le t_0+T_2 \\ 0 &{} t> t_0+T_2 \\ \end{array}\right. } \nonumber \\ \end{aligned}$$
(5)

while f(t) becomes

$$\begin{aligned} f(t-t_0) = d_b(t-t_0)-d_b(t-(t_0+T_d)) \end{aligned}$$
(6)

with a resulting disturbance model that is fully characterized by 7 different parameters

$$\begin{aligned} \begin{array}{ll} h(t, A, t_0, \tau , \omega , T_1, T_2, T_d, T_f,n) = \\ \sum _{i=1}^{n} f(t, A, t_0+(i-1) T_f, \tau , \omega , T_1, T_2, T_d). \end{array} \end{aligned}$$
(7)

Equation 7 can be considered the signature of the backlash (i.e., of an excessive value of backlash) in the robotic transmission.

The function and its parameters are shown in Fig. 6.

Fig. 6
figure 6

Visualization of the h(t) function with its time related parameters. In the example a disturbance model with \(n=2\) replica is considered. The starting point of the disturbance \(t_0\), and the meaningful time intervals \(T_1, T_2, T_d, T_f\) are reported. The two dampened oscillations, \(d_b(t)\) and \(d_b(t-t_d)\), related to a first (n=1) turn of the load are reported in the left half of the graph. They are followed by two more oscillations relative to a second (n=2) turn of the load

To avoid identification errors due to noise, more than one replica of the f(t) function was considered in the h(t) model. In this way the mean value of the parameters was de facto identified. A model with \(n=12\) disturbance repetitions was considered for the experiments. Also, \(T_d\) was set as \(T_d = T_f / 2\) since two impacts happen in a full load rotation.

Just as the backlash changes over time, its signature also changes. Therefore, a model-based analysis was used to obtain the signature evolution over time. In particular, the relationship between the backlash amount in the joint and the characteristics of the backlash signature was studied.

Matlab/Simulink model of the robotic joint

To simulate robotic joint aging and the corresponding backlash increase, a Matlab/Simulink model of the joint was developed. The model has helped to fully understand the effects of the backlash on the system dynamics, and to quickly obtain the data related to the long-term wear of the gears.

The Simulink model developed for the robotic joint is reported in Fig. 7.

Fig. 7
figure 7

Simulink model of the system. The main blocks of the system and their interconnections are showed in a, while details about motor and link blocks are reported in b and c respectively

It was made of 4 main blocks: (i) the motor with the encoder, (ii) the transmission, (iii) the load, and (iv) the position/speed control loop, with linear feedback from the motor encoder. The total backlash of the system, termed \(\delta \) and given by all the loosely connected elements in the transmission, was modeled within the transmission block. The gravity effect on the load, \(G(\theta _l)\), was considered in the load block through the equation of the load dynamics.

The system used to model the dynamics of the joint is shown in Fig. 8.

The motor and load were represented by the two lumped inertia, \(J_m\) and \(J_l\), while the transmission was modeled by an elastic coupling with stiffness \(K_s\) and dumping \(D_s\). The small transmission inertia was neglected. A dead zone model for the backlash was considered, and the angular gap was represented by \(2\delta \). Finally, the gear ratio was represented by N. To model the backlash dynamics the modified dead zone model described by Papageorgiou et al. (2017, 2019) was used. In the model, the backlash effects are represented as the effects of a variable coupling stiffness, \(K_{BL}(\varDelta \theta )\),

$$\begin{aligned}&K_{BL}(\varDelta \theta , \delta ) =\nonumber \\&\quad \frac{K_s}{\pi }[\pi +\arctan (\alpha (\varDelta \theta -\delta ))-\arctan (\alpha (\varDelta \theta +\delta ))] \end{aligned}$$
(8)

that assumes the constant value \(K_s\) outside the backlash gap (for \({|\varDelta \theta |}>\delta \)), and a 0 value inside the gap ( \({|\varDelta \theta |}\le \delta \)), see Fig. 9. By considering this variable stiffness, the torque transmitted to the load is described with

$$\begin{aligned} \tau _l = [\varDelta \theta -\delta \cdot sign(\varDelta \theta )+\frac{D_s}{K_s}\varDelta \omega ] \cdot K_{BL}(\varDelta \theta , \delta ) \end{aligned}$$
(9)

and is proportional to \(\varDelta \theta \) and \(\varDelta \omega \) outside the backlash gap, while is null inside the gap. As a result, the interruption in the motor to load motion transmission described in “Proposed approach” section is modeled. This model was selected among the many available in the literature because, unlike the typical dead zone models, it does not present discontinuities. The arctan function, in fact, provides a sudden but continuous stiffness variation. This feature was extremely useful to have a controlled simulation, as discontinuities typically cause instabilities in the simulated systems. A short survey of the most commonly used models of backlash can be found in the work of Nordin and Gutman (2002) and Nordin et al. (1997).

The parameters of the Simulink model were estimated by processing experimental data collected on a test-bench, and the final validation of the simulator was performed by comparing the current and the encoder signals from Simulink with the corresponding signals from the test bench. The comparison is shown in Fig. 10.

where the light color is used for the signals measured on the test bench, while the dark color is adopted for the signals generated by the Simulink model. As evident from the plots, the simulator was able to reproduce the behavior of the joint with backlash.

Fig. 8
figure 8

Two rotary inertia system with elastic coupling and backlash. A simplified dead zone model for backlash is used

Fig. 9
figure 9

Variable stiffness of the coupling

Dataset creation

The effects of the backlash on the robotic joint dynamics were studied through the simulation. By controlling the value of the backlash parameter in the model, both the analysis of a great range of scenarios and the time-effective data gathering were enabled. By leveraging the simulation, the aging process of the joint was accelerated, and an heterogeneous set of data was quickly obtained. Collecting the same data from a real world system would have required months or even years of observation.

All the simulation were performed under the same working conditions of constant motor speed (100rps) and constant load (5Kg). Data from the motor encoder were collected and divided into 10 sets. Each set corresponds to a different level of backlash within the interval \([3\times 10^{-4}\mathrm{rad}; 21 \times 10^{-4} \mathrm{rad}]\), where maximum and minimum values were chosen as significant for the system under test. A discrete step, \(\varDelta \delta = {2\times 10^{-4}}\mathrm{rad}\), was used to span the interval.

Fig. 10
figure 10

Simulation and actual data comparison. Signals from the simulator (black) and from the test bench (light blue) are compared. Motor position, motor speed, and motor current are reported. Backlash disturbance effect is clearly visible in both simulated and actual data showing the same characteristics (color figure online)

Backlash signature behaviour

The many datasets obtained by simulation, and corresponding to different values of backlash, were used to investigate the relationship between the backlash variation and the change in the disturbance signal. It was found that the amplitude of the disturbance oscillation and the backlash value in the joint are related, with the disturbance oscillation regularly increasing as backlash gap enlarges. Such behaviour is visible in Fig. 11, where the disturbance signals from the datasets are plotted together showing the effects of a constantly increasing backlash. The A parameter of the disturbance model, since related to the disturbance amplitude, was taken as a reference for the backlash estimate and was considered as the fault growth parameter to be monitored. By looking at the value of A over time some information about the change in backlash in the system is obtained. To get an absolute backlash measurement, a stage of calibration is also required. This step is necessary to associate the value of A to the backlash value in the particular system considered. The amplitude of A, in fact, depends on the mechanical characteristics of the system under test.

For the simulated system, the mapping \(\delta (A)\) between the backlash value in the joint and the corresponding value of A is reported in Fig. 12. The function was built by plotting the maximum amplitude of the signals in Fig. 11 against the corresponding backlash values. Finally, the points were fitted with a cubic polynomial function. The resulting function was

$$\begin{aligned} \begin{array}{ll} \delta =f(A) = {10^{-06}} \cdot (-0.0018 \cdot A^3 + 0.4323 \cdot A^2\\ \qquad + 9.2524 \cdot A+34.0313) \end{array} \end{aligned}$$
(10)

with a fitting error (RMSE) of \({8.6983 \times 10^{-10}}\mathrm{rad}\).

Fig. 11
figure 11

Disturbance pattern at increasing backlash gap. Each signal corresponds to the disturbance vibration visible on the motor speed signal for a different value of backlash. The vibration is excited by the small, backlash caused, impacts inside the mechanical transmission. The 10 signals correspond to the 10 different backlash values considered for the simulations

Fig. 12
figure 12

The relationship that binds the backlash in the joint \(\delta \) with the value of the amplitude of the speed disturbance A, for the particular system that has been simulated. Each point corresponds to a single simulation. The \(\delta \) coordinate is the backlash value considered to run the simulation, while the A coordinate is the amplitude of the corresponding disturbance observed on the encoder signal. The 10 couples \((A,\delta )\) correspond to the 10 dataset considered. The continuous line is their cubic interpolation, \(\delta =f(A)\)

This function was considered as the static characteristic of the backlash virtual sensor.

Problem formulation

It was said that for the estimation of the backlash \(\delta \) it is first necessary to calculate A. The value of A, and the value of all the other parameters in the model given by Eq. ( 7), is obtained by solving an optimization problem. The problem finds the best combination of values for the parameters that makes the model h(t) fit the signal from the encoder. Since the problem is non-convex, the classical optimization techniques cannot be straightforwardly applied. Thus, a stochastic optimization meta-heuristic is exploited. The problem formulation and the characteristics of the algorithm are described hereafter. As a first step, the difference signal r(t) between the measured speed and the theoretical speed (i.e., evaluated for no backlash condition), is computed. The resulting signal is

$$\begin{aligned} r(t)=v(t)-v_t \end{aligned}$$
(11)

where \(v_t\) is the commanded motor speed defined by the test conditions. Then, the difference signal is scanned to detect the backlash disturbance pattern h(t). The detection relies on the minimization of the error between the residual signal and the model signal h(t). In particular, the cost function used is the Root Mean Square Error (RMSE) between the two signals

$$\begin{aligned} RMSE = \sqrt{{\frac{\varSigma _{i=1}^{N}\Big (r(t) - h(t, A, t_0, \tau , \omega , T_1, T_2, T_d, T_f)\Big )^2}{N}}}. \end{aligned}$$

The parameters to be identified are then

$$\begin{aligned} X = [A, t_0, \tau , \omega , T_1, T_2, T_d, T_f] \end{aligned}$$

and their upper and lower bounds are shown in Table 1. Reference values are derived from specifications of the system evaluated in the experiments.

Stochastic optimization algorithm

Different nature-inspired optimization strategies, such as swarm algorithms and evolutionary algorithms, were considered to perform the fitting. Tests and results comparison were performed to select the most suitable one between four different swarm algorithms and an evolutionary algorithm. The benchmarks used for the comparison were: the ease of implementation, and quality of the solution in terms of accuracy and precision. The results reported by Giovannitti et al. (2021) showed that, even if promising in terms of ease of implementation and memory occupation, none of the swarm algorithms considered for the test proved suitable for the problem of interest. The best performance was delivered by the evolutionary algorithm Covariance Matrix Adaptation Evolution Strategy (CMA-ES).

CMA-ES is an optimization method first proposed by Hansen et al. (1995) in the mid-90s, and further developed in subsequent years (Hansen & Ostermeier, 2001; Hansen et al. 2003). Similar to quasi-Newton methods, the CMA-ES is a second-order approach estimating a positive definite matrix within an iterative procedure. More precisely, it exploits a covariance matrix, closely related to the inverse Hessian on convex-quadratic functions. The approach is best suited for difficult non-linear, non-convex, and non-separable problems, of at least moderate dimensionality (i.e., \(n \in [10,100]\)). In contrast to quasi-Newton methods, the CMA-ES does not use, nor approximate gradients, and does not even presume their existence. Thus, it can be used where derivative-based methods, e.g., Broyden-Fletcher-Goldfarb-Shanno (Fletcher 2013) or conjugate gradient, fail due to discontinuities, sharp bends, noise, local optima, etc.

In CMA-ES, iteration steps are called generations due to its biological foundations. The value of a generic algorithm parameter y during generation g is denoted with \({y^{(g)}}\). The mean vector \({\mathbf {m}}^{(g)} \in {{\mathbb {R}}^n}\) represents the favorite, most-promising solution so far. The step size \(\sigma ^{(g)} \in {{\mathbb {R}}_ + }\) controls the step length, and the covariance matrix \({\mathbf {C}}^{(g)} \in {{\mathbb {R}}^{n \times n}}\) determines the shape of the distribution ellipsoid in the search space. Its goal is, loosely speaking, to fit the search distribution to the contour lines of the objective function f to be minimized. \({\mathbf {C}}^{(0)} = {\mathbf {I}}\)

In each generation g, \(\lambda \) new solutions \({{\mathbf {x}}}_i^{(g+1)} \in {{\mathbb {R}}^n}\) are generated by sampling a multi-variate normal distribution \({\mathcal {N}}({\mathbf {0}},{\mathbf {C}})\) with mean \({\mathbf {0}}\) (see Eq. 12).

$$\begin{aligned} {\mathbf {x}}_k^{(g + 1)} \sim {\mathcal {N}}\left( {{{{\mathbf {m}}}^{(g)}},{{\left( {{\sigma ^{(g)}}} \right) }^2}{{{\mathbf {C}}}^{(g)}}} \right) \text {, } k=1, \ldots , \lambda \end{aligned}$$
(12)

where the symbol \(\cdot \sim \cdot \) denotes the same distribution on the left and right side.

After the sampling phase, new solutions are evaluated and ranked. \({{\mathbf {x}}_{i:\lambda }}\) denotes the \(i^{th}\) ranked solution point, such that \(f({{\mathbf {x}}_{1:\lambda }}) \le \ldots \le f({{\mathbf {x}}_{\lambda :\lambda }})\). The \(\mu \) best among the \(\lambda \) are selected and used for directing the next generation \(g + 1\). First, the distribution mean is updated (see Eq. 13).

$$\begin{aligned} {{{\mathbf {m}}}^{(g + 1)}} = \sum \limits _{i = 1}^\mu {{w_i}{\mathbf {x}}_i^{(g)}} \text {, } w_1 \ge \ldots \ge w_\mu > 0 \text {, } \sum \limits _{i = 1}^\mu {{w_i}} = 1 \end{aligned}$$
(13)

In order to optimize its internal parameters, the CMA-ES tracks the so-called evolution paths, sequences of successive normalized steps over a number of generations. \({{{\mathbf {p}}}^{(g)}_\sigma } \in {{\mathbb {R}}^n}\) is the conjugate evolution path. \({\mathbf {p}}^{(0)}_\sigma = {\mathbf {0}}\). \(\sqrt{2} \frac{{\varGamma \left( {\frac{{n + 1}}{2}} \right) }}{{\varGamma \left( {\frac{n}{2}} \right) }} \approx \sqrt{n} + {\mathcal {O}}\left( {\frac{1}{n}} \right) \) is the expectation of the Euclidean norm of a \({\mathcal {N}}\left( {{{\mathbf {0}}},{{\mathbf {I}}}} \right) \) distributed random vector, used to normalize paths. \({\mu _{{\text {eff}}}} = {\left( {\sum \limits _{1 = 1}^\mu {w_i^2} } \right) ^{ - 1}}\) is usually denoted as variance effective selection mass. Let \({c_\sigma } < 1\) be the learning rate for cumulation for the rank-one update of the covariance matrix; \({d_\sigma } \approx 1\) be the damping parameter for step size update. Paths are updated according to Eqs. (14 and 15).

$$\begin{aligned}&{{\mathbf {p}}}_\sigma ^{(g + 1)} = (1 - {c_\sigma }){{\mathbf {p}}}_\sigma ^{(g)} + \sqrt{{c_\sigma }(2 - {c_\sigma }){\mu _{{\text {eff}}}}}\nonumber \\&\quad {{{\mathbf {C}}}^{(g) - \frac{1}{2}}} \frac{{{{{\mathbf {m}}}^{(g + 1)}} - {{{\mathbf {m}}}^{(g)}}}}{{{\sigma ^{(g)}}}} \end{aligned}$$
(14)
$$\begin{aligned}&\quad {\sigma ^{(g + 1)}} = {\sigma ^{(g)}}\exp \nonumber \\&\quad \left( {\frac{{{c_\sigma }}}{{{d_\sigma }}}\left( {\frac{{\left\| {{{\mathbf {p}}}_\sigma ^{(g + 1)}} \right\| }}{{\sqrt{2} \frac{{\varGamma \left( {\frac{{n + 1}}{2}} \right) }}{{\varGamma \left( {\frac{n}{2}} \right) }}}} - 1} \right) } \right) \end{aligned}$$
(15)

\({{{\mathbf {p}}}^{(g)}_c} \in {{\mathbb {R}}^n}\) is the evolution path, \({\mathbf {p}}^{(0)}_c = {\mathbf {0}}\). Let \({c_c} < 1\) be the learning rate for cumulation for the rank-one update of the covariance matrix. Let \({\mu _{{\text {cov}} }}\) be parameter for weighting between rank-one and rank-\(\mu \) update, and \(c_{{\text {cov}} } \le 1\) be learning rate for the covariance matrix update. The covariance matrix \({\mathbf {C}}\) is updated (Eqs. 16 and 17).

$$\begin{aligned} {{\mathbf {p}}}_c^{(g + 1)}= & {} (1 - {c_c}){{\mathbf {p}}}_c^{(g)} + \sqrt{{c_c}(2 - {c_c}){\mu _{{\text {eff}}}}} \frac{{{{{\mathbf {m}}}^{(g + 1)}} - {{{\mathbf {m}}}^{(g)}}}}{{{\sigma ^{(g)}}}}\nonumber \\ \end{aligned}$$
(16)
$$\begin{aligned} {{{\mathbf {C}}}^{(g + 1)}}= & {} (1 - {c_{{\text {cov}} }}){{{\mathbf {C}}}^{(g)}} + \frac{{{c_{{\text {cov}} }}}}{{{\mu _{{\text {cov}} }}}} \nonumber \\&\times \left( {\mathop {{{\mathbf {p}}}_c^{(g + 1)}}\nolimits ^{} \mathop {{{\mathbf {p}}}_c^{(g + 1)}}\nolimits ^T + \delta \left( {h_\sigma ^{(g + 1)}} \right) {{{\mathbf {C}}}^{(g)}}} \right) \nonumber \\&+\, {c_{{\text {cov}} }}\left( {1 - \frac{1}{{{\mu _{{\text {cov}} }}}}} \right) \sum \limits _{i = 1}^\mu {{w_i}} {\text {OP}} \left( {\frac{{{{\mathbf {x}}}_{i:\lambda }^{(g + 1)} - {{{\mathbf {m}}}^{(g)}}}}{{{\sigma ^{(g)}}}}} \right) \nonumber \\ \end{aligned}$$
(17)

where \({\text {OP}} \left( {{\mathbf {X}}} \right) = {{\mathbf {X}}} {{{\mathbf {X}}}^{{\mathbf {T}}}} = {\text {OP}} ( - {{\mathbf {X}}})\).

Most noticeably, the CMA-ES requires almost no parameter tuning for its application. The choice of strategy internal parameters is not left to the user. Notably, the default population size \(\lambda \) is comparatively small to allow for fast convergence. Restarts with increasing population size has been demonstrated useful for improving the global search performance (Auger & Hansen, 2005), and it is nowadays included as an option in the standard algorithm.

Table 1 Acceptable ranges of the model’s parameters
Table 2 Results of the experimental evaluation, on artificial datasets with noise, and different values of A

Experimental evaluation

Several data sets were processed with the CMA-ES algorithm to detect the oscillations created by the backlash, and to obtain the parameters of the disturbance model. Particular attention was given to the A parameter since, as described in paragraph 3 and 4, the amplitude of the first oscillation was closely related to the amplitude of the backlash. By monitoring the evolution of parameter A over time, an indication of the backlash progression in the joint was given. It needs to be noted that the proposed method estimates the overall backlash of the transmission of the robotic joint. In the case of a transmission with many components suffering from backlash, whether they are gears or gearboxes, the value measured by the algorithm will be given by the sum of all their effects. This is motivated by our interest in measuring the overall effect that backlash has on the performance of the robotic joint to schedule the proper timing for the maintenance intervention. The identification of the specific component that originates the problem was out of the scope of this work.

To replicate more closely a real-world situation, a further improvement was introduced for the simulated data sets. Since the simulated data always started from a fixed time point, while real-world collected data has a starting point that is unknown, a random sample of the data was chosen as starting point for the data analysis. It was randomly selected, using a uniform probability, among the first 100 samples of the dataset.

Since the optimization algorithm chosen for the approach was stochastic, the identification procedure was run several times on the same data set to analyze the repeatability of the results. Then the results were compared considering the the error between the expected and the estimated value of A.

The experiments described in this paper, both the source code of the tools and the data, are available under the European Union Public Licence,Footnote 1 version 1.2 or later, from a public repository on GitHub.Footnote 2

Testing noise sensitivity on simulated data

In real-world applications the presence of noise in a captured signal it’s unavoidable. Noise may distort and obscure the information content of the signal having a negative impact on the backlash signal detection. Datasets used for the previous batch of experiments, and described in “Dataset creation” section, were built by simulation, hence they only contain the effects of the phenomena considered in the model and, more importantly, they are noise free. To achieve working conditions that were as close as possible to robot actual conditions, a random noise was added to the clean data. The noisy datasets were tested and the effect of of noise on the performance of the fitting algorithm was evaluated.

An additive noise model was considered, and the noisy signal were taken as the sum of the original clean signal, v(t), and the noise signal n(t):

$$\begin{aligned} v_n(t) = v(t) + n(t) \end{aligned}$$
(18)

where:

$$\begin{aligned} n(t) \sim \frac{A_n-A_{min}}{\varDelta A} \cdot {\mathcal {U}} (-0.5, 0.5) \end{aligned}$$
(19)

The signal n(t) is a uniform white noise with a 0 mean. The amplitude of the noise was chosen to be different for each dataset, and proportional to the corresponding expected backlash disturbance amplitude. This choice was made to avoid cases where the use of a fixed noise amplitude could have obscured the backlash information in the signal. Datasets where the backlash disturbance is small are thus corrupted with a small level of noise. While, an increased level of noise is used for datasets corresponding to a larger backlash disturbance on the signal. It is expected that systems more affected by backlash will also feature larger amounts of noise.

To to verify the robustness of the method with respect to data corrupted by noise, a first batch of experiment is performed. Ten datasets are considered to evaluate the ability of the algorithm to detect backlash in the early stages of its appearance. Once again, tests are repeated 30 times on each dataset, to obtain a statistical distribution of the results. A summary of the experimental results is reported in Table 2.

In the table, each row corresponds to 30 runs of the proposed approach on the same dataset. The datasets are sorted by the corresponding expected value of A, and the amplitude of the noise added to the original clean signal is reported in the second column. The third and fourth columns are for the estimate of A given by the algorithm and expressed by the mean value and the variance of the results over 30 repeated experiments. Finally, the last column is for the relative error between the estimated value of A and its theoretical value. Looking at the data it can be found that the error is always below the threshold of 10%. In terms of backlash value this means a maximum absolute error of \({0.0004}\mathrm{rad}\) in the estimate of the backlash value for the dataset with the highest backlash (Speeddata_noise_bk0.0021), and an absolute error lower than \({0.0002}\mathrm{rad}\) in the estimate of the backalsh value for the dataset with the smallest backlash (Speeddata_noise_bk0.0003). This indicates that the proposed approach provides reliable estimates of the backlash value in the early stages of its appearance, but unfortunately the quality of the estimate decreases with the increase of backlash. A further noteworthy result is the one obtained in the case of the analysis applied to a dataset that does not contain backlash but just contains noise. The very small estimated value confirms that the algorithm correctly recognizes the absence of backlash. Although a further improvement of the method could be considered to expand the range of validity of the estimate, the results obtained so far are already sufficient for the initial purpose of the present work. They show that the method can distinguish “healthy” working conditions from early stages of the backlash presence and to identify the increase of the backlash in the robot joint.

Testing on real-world data from the test bench

The final part of the experimental evaluation was about the assessment of the method’s performance on data collected from real-world systems. In particular the signals used were taken from a test bench. The test bench is a simple mechanical system consisting of one of the six joints of a small industrial manipulator belonging to the COMAU robot family. The exact model cannot be disclosed due to confidentiality agreements with the company. The joint was detached from the robot and mounted on a bench. With this setup the backlash effect on the encoder was evaluated avoiding any possible interference arising from the other joints of the robot.

Fig. 13
figure 13

Test bench with the examined robotic joint

Table 3 Results of the experimental evaluation, on real-world datasets obtained by running experiments on a physical system with different levels of backlash
Table 4 Results of the experimental evaluation, on datasets from two robots working in an industrial plant

The bench consist of a motor, a transmission belt, a reducer, a cast iron mass which serves as load, and an encoder, see Fig. 13. The encoder is an absolute encoder with 19bit resolution, with \({1 \times 10^{-5}}\mathrm{rad}\) as the minimum detectable displacement.

The joint was both position and speed controlled, and was driven by a standard COMAU robot controller. To cause wear in the mechanical components and stimulate the onset of the backlash, the system was kept continuously operated under maximum stress conditions for more than 8 months. Every 3 weeks, the stress cycle was stopped to execute a specific test cycle and collect data from the motor encoder. Together with the encoder signal capture, a static measure of the backlash was manually performed by a skilled technician. The static measures served as the reference value for the final validation of the backlash estimate provided by the proposed approach. Three different datasets, corresponding to subsequent states in time of the system, were considered for the test of the algorithm. A summary of the experimental results is reported in Table 3.

Testing on real-world data from robots

As a final step, the full robot system was considered for the assessment of the method. The goal of this last part of the assessment was to establish if the methodology developed on the single-joint test-bench was still valid for a multi-joint system. In this case, the backlash detection is more difficult, since data collected from a joint may be affected by disturbances arising from the other joints and hiding the backlash effects.

Two medium-sized COMAU robots of the same model, operating in an industrial plant in Brasil, and performing material handling or spot welding applications were considered for the assessment. The robots were labeled Robot A and Robot B. Data was gathered by means of a specific test cycle, properly designed to stimulate the oscillating backlash disturbance, and easily performed during a short production stop. Unfortunately, the actual value of the backlash in the joint was not measured on these robots as the short production stop was not long enough to perform the static measurements. Nevertheless, from the quality of the workpieces produced, it was still possible to assess that Robot B was affected by backlash, while Robot A was not. From the results of the test reported in Table 4, it clearly emerges how the proposed approach is able to detect backlash presence in a joint even in the case of a full robot system. Data reported in the table are related to the the axis 4 of the robots.

Conclusions

This paper presents a new method to estimate the value of the backlash in a robotic joint by using the signals collected with the motor encoder. Unlike other approaches to the backlash measurement, no auxiliary sensors or tools are required. The method is built around the stimulation and the detection of a backlash-caused disturbance visible on the motor speed signal. The detection is performed in the time domain by fitting a mathematical model of the disturbance on the speed data. A state-of-the-art meta-heuristic strategy, the CMA-ES, is used to perform the fitting, due to the good performances showed in avoiding the many local minima of the complex optimization problem. An estimate of the amount of backlash in the joint is given by measuring the amplitude of the disturbance. Model-based simulations prove that the amplitude of the disturbance and the size of the backlash gap are strictly related, and their relationship is well described by a simple monotone polynomial function. The backlash disturbance model, the CMA-ES optimization and the function \(\delta =f(A)\), that relates the amplitude of the disturbance and the amount of backlash, are the key elements to build the virtual sensor for the backlash.

Two sets of experimental evaluations are performed to assess the validity of the method. A first batch of tests is conducted on simulated data with noise. A second batch concerns real-world data, collected from a single-joint test-bench and from robots working in a manufacturing site. Results show that the method is highly capable of distinguishing the joints with backlash from those without backlash and to provide highly reliable estimates of the backlash value in the early stages of its appearance. However, the quality of the estimate decreases at high values of backlash. Such behavior it’s still compliant with the needs of a maintenance system which requires an accurate estimate of small values of backlash, to carefully evaluate the postponement of unnecessary maintenance interventions; while requires a less accurate estimate of high values of backlash that are probably close to the point that maintenance intervention is strongly recommended in any case.

The proposed method can be of great use for industry since it is based on a fast, simple, and easily automatable procedure. It does not require the installation of additional sensors, and only relies on the use of a standard, 19bit resolution, motor encoder. The data acquisition phase relays on standard test movements, and can be performed by any robot user, without the need of skilled technicians intervention. Furthermore, it only takes a few minutes to be performed. The data analysis phase has a longer length but, since automatic, can be executed offline at a later time. In this scenario, the manufacturing downtime is considerably reduced along with the loss of revenues for the company. Moreover, the method is general and can be applied to any mechanical device that is potentially subject to backlash.

At the time of writing, namely today, the presented procedure is implemented in a Industrial Internet of Things (IIoT) Platform that is installed in some automotive manufacturing plants and is collecting data from robots. These data are required for the fine tuning of the method and their collection will be completed within a few months.