1 Introduction

Teleoperation systems for remote welding have the potential to save humans from operating in risky, high-hazard situations and environments, such as remote sites like deserts, combat situations, areas of natural disaster, underserved sites [1]–[3], or hyper/hypobaric conditions, such as high-altitude and underwater environments [4]–[6]. Equally, teleoperation enables the most productive utilization of scarce expertise. However, latency is a major issue in conventional telerobotic systems [7]. One of the major challenges for network-reliant teleoperation systems is communication lag time or latency [8, 9]. Latency has been shown to be one of, if not the most important, the factors in operational success, ease of use, and quality. It causes adverse effects like low-quality welds or loss of operation time, as well as larger consequences, such as catastrophic failure of a welded section or injury to a human operator. Overall system delay has a direct impact on performance [10], and studies [11, 12] show humans are sensitive to latency as low as 10 ms, causing sensory dissonance and motion sickness [13, 14].

Teleoperation robots have been studied mainly in controlled local environments, but physically separating the master/slave communicator systems introduces factors of delay [15]. The narrow spectrum of radio frequencies poses a limitation on data rates through wireless channels, and, in combination with random reflections, attenuations, and other fluctuations in space, there are inevitable packet losses and random delays [16]. Analysis of the capacity of wireless networks shows the operating conditions of the proposed system cause packet losses and reduced throughput capacity as the number of intermediate nodes increases with distance [17].

With the introduction of 5G, while ultra-reliable low-latency communication is achievable in controlled environments [16], remote scenarios are known to be very difficult [18] and the cost of reliable connections is high [19]. Intercontinental latency, for example, goes up to some hundreds of milliseconds based on field experiments [20]–[23]. In particular, exploring system limits in extreme environments and distances of 10,000 km found delays of 0.2–0.4 s. Similar experiments [24] show very large distance is a significant factor in latency effects, with up to 1.4-s round-trip delay.

Video-feedback systems, such as those implemented through mixed reality (MR), require a lot of bandwidth and have also been shown to be a major source of delay [16]–[27]. Even with significant bandwidth allocation, online VR connections may involve delays of up to 100 ms, particularly during extreme events, such as those likely to occur in the implemented environment of the proposed system [28, 29]. Advanced telesurgery robots have incorporated routines to suspend operations upon detecting the corruption of multiple packets or breakdowns. While threshold detection methods for these robots are not published [30, 31], Xu et al. suggest a 100- to 200-ms latency threshold to avoid major inaccuracies in telesurgery [32, 33], with the threshold for general haptic-feedback inclusive teleoperation between 5.5 and 50 ms [34]–[36]. Multiple experiments also experienced high delays even under ideal conditions and with dedicated high-speed networks [15, 37]. Thus, for a low-cost system, these latencies would potentially be much higher. Ease of operation and manipulation of the human interface must also be considered in reducing latency so the solution does not create new performance or practical issues.

Methods to solve latency include improving control [38, 39] and network design and architecture [40, 41], and incorporating supervisory and predictive control [42]–[44]. In particular, extensive research examined passivity-based methods with the usage of wave variables, as introduced by Niemeyer and Sloting [45], extended from Anderson and Spong’s work on bilateral control methods [46]. However, these approaches have only been effective for loop delays and latency up to 0.5–1 s, with serious performance and stability limitations from 1 to 2 s due to extremely low stiffness [47]. Compensatory methods by the operator have also been explored, including slowing down operator movements, virtual assistive tools, and scope limitations [48, 49]. While these methods alleviated the negative effects of delay, they reduce the ease of use, task completion time, and range of applicability, as having an on-site operator must be avoided in extreme environments.

Even using state-of-the-art hardware, weather changes cause temporary delays and outages, and ships on the surface may interrupt submarine welding system signals, resulting in latency of up to 1–2 s. Costs must also be considered [50]. Instead of pursuing improvements in network protocols or hardware, as extensively investigated in other research [51, 52], we propose a way to mitigate general network delays using shared autonomy with ML, in this case using HMMs as a latency mitigation protocol for predicting and generating motion interactively based on partial demonstration from the teleoperator.

More autonomous systems have been found to be more tolerant of network delays. The system may be evaluated against communication systems noted previously, as its generative features lend themselves to being a general-case protocol for teleoperation robotics. HMMs have been used in various applications as a forecasting tool [53]–[57]. However, their use in welding applications has been limited [58, 59]. In particular, real-time welding operation-focused HMM forecasting is lacking a designed approach. Thus, the proposed system addresses this research gap.

Assistive welding systems also save time in training human operators, as cooperative manipulation of the device is achieved by smoothing the command signal sent from the human-operated closed-loop control device. Smoothing has innate benefits, as weld quality can also suffer due to noisy sensors, and poor signal processing [60]. Tian et al. [44] used HMM-based controllers to improve reliability, and Tanwani et al. [61] demonstrated a hidden semi-Markov model (HSMM) to compare performance in different control regimes, finding improvements in precision and operation time. Wang et al. [58] are one of few examples of HMMs applied to a similar welding system, developing assisted welding operation using learned intention recognition techniques.

These studies largely focus on improving performance and operation quality. However, the explicit application of HMMs for time-delay reduction in a welding system has not yet been addressed. In this work, we propose HMMs as a general-case latency mitigation protocol to deal with error-inducing time delays inherent in MR-based teleoperated welding systems. Section 2 lays down the theoretical groundwork for HMMs and identifies optimal model parameters. Section 3 summarizes key parameters for forecasting. These sections’ outcomes are applied together with theory and validated in Sect. 4, where a practical model to eliminate delay is identified. Section 5 interprets results and provides areas for future improvement, and finally, Sect. 6 concludes the work.

2 Methods

2.1 System architecture

Human in the loop: the proposed robotic tele-welding system features imitative motion mapping from the user’s hand movements to the welding robot motions, and it enables the spatial velocity-based control of the robot tool center point (TCP) to accurately track the welder’s hand movements. The system allows the user to intuitively and precisely manipulate the position and orientation of the end effector to adjust the corresponding welding parameters including travel speed and travel and work angles. The system layout is described first to illustrate the implementation target for the proposed algorithm and is shown in Fig. 1.

Fig. 1
figure 1

System architecture showing hardware connections and configurations relative to the user and actuating robot arm.

User movements are tracked via the HTC Vive MR platform (HTC Corporation, Taiwan) and input into Unity software (Unity Technologies, USA) through a binocular head-mounted display (HMD). A monocular Logitech C615 webcam (Logitech International S.A, Switzerland) with an auto-darkening lens is mounted on the robot’s wrist to observe the welding process and provide the operator with a direct view of the workpieces. Digital twin technology was used to capture the physical UR5 robot pose during operation and allows the welders to view the rotation status of each joint. The combination of the virtual twin and onsite video streams in MR provides comprehensive real-time monitoring of the robot’s operating status. It also assists in accurately and efficiently amending welding motion based on data from the robot model. The scale ratio for the virtual UR5 robot is 1:5, so the digital twin data and motions fit the user’s view in the MR welding workspace.

This platform enables the system to track the pose of the user’s head during teleoperation, synchronizing the perspective angle and view in the generated mixed reality environment and providing appropriate options via the user interface. Operation is done remotely, as in Fig. 2. Multisensory data is transmitted back to the user in a closed feedback loop with the Phantom haptic stylus, representing a welding torch. The welding torch head does not collide with the workpiece during welding at the robot site, and no force sensor is attached to the end effector of the robot. The haptic effect of the virtual fixture is generated and displayed at the user side to steady the user’s hand and avoid unwanted collisions. The stylus has six revolute joints. The user can observe the output movements of a digital twin of the actuating UR5 robot (Universal Robotics, Denmark). The digital twin is subscribed to controls fed through rosbridge (via ROS) and ultimately performed with the welding torch attached to UR5 robot, as shown in Fig. 3.

Fig. 2
figure 2

Remote operator using haptic feedback device in conjunction with MR headset on remote cloud access device (PC) to operate UR5 robot for welding.

Fig. 3
figure 3

UR5 robot arm actuation. (a) Connection of welding torch to robot. (b) Welding operation using actuator via remote control.

In addition, cloud networking is used to integrate individual components of the system and host execution programs in a single location. Benefits include heightened accessibility and lower overall cost for MR-based teleoperated welding.

Actualization involves the sum of the delays from the simultaneous transmission of the bilateral teleoperator control system and video signals together. Control latency in the system is generated from the time for (1) the output of the haptic pen to be converted into a digital signal; (2) the digital signal to be transmitted to the UR5 robot; and (3) the digital signal at the UR5 to be translated into mechanical movement.

As the system incorporates an MR human interface, the visual discrepancy is present through video communications due to the time delay present in the following: (1) digitization and compression of visual information from the camera by video codec; (2) transmission of the video data across the network; and (3) decompression by remote codec for display in the control MR headset. The overall effect of delay in the proposed system is shown conceptually in Fig. 4. The result is a combination of these various sources of delay.

Fig. 4
figure 4

Overall delay in the system showing the concept of an increase in delay \({d}_{i}\) due to multiple sources of delay \(\delta {t}_{i}\). Consecutive signal segments \(n = 1, 2, 3,\dots\) in black show practical delay; blue shows theoretical process flow if there was no delay.

ROS packages libfreenect2 and IAI Kinect2 are used for image compression. The system utilizes ROS1, as, at the beginning of system development, ROS2 was not launched yet. ROS1 is sufficient to meet the system requirements and may prove useful for compatibility with older systems.

There is a prediction of a combination of control and visual latency. However, for the scope of this paper, only control latency is calculated. These same wireless connection issues (latency effects) are assumed to be experienced in the proposed system, as the server-network style data communication over wireless communications used is the same as in previous research. Therefore, measurable local latency is combined with latency measured through previous research to compensate for experimental set-up limitations for overall system latency calculations. Although a cloud network is utilized, the internal delay is extracted from the ROS-robot communication pathway from a local machine, as it is assumed it is sufficiently approximated by a Linux instance in the cloud.

2.2 Univariate training of HMMs

2.2.1 Principles

The hidden Markov model (HMM) is a statistical model enabling the prediction of data using “hidden variables,” generated from an input sequence. The hidden state sequence is not observed directly but assumed to be the driving force behind the trend of observable data, and then used to generate a likely observable output sequence. In this case, the hidden states are the human welder’s intended motion. The input data is the velocity data on the y-axis, and the output forecast series is the projected future velocity data set of the robot.

The transitions in the hidden state sequence set S are assumed to have the form of a first-order Markov chain. A Markov chain is a stochastic process based on assuming the value of any given state relies only on the immediately preceding state, as in Fig. 5, and mathematically defined:

$$P\left[{q}_{t}\right.\left.={S}_{j}\mid {q}_{t-1}={S}_{i},{q}_{t-2}={S}_{k},\dots \right]=P\left[{q}_{t}={S}_{j}\mid {q}_{t-1}={S}_{i}\right]$$
(1)

where \({q}_{t}\) is a member of the set \({Q}_{t}\), the set of predicted discrete hidden state variables, at time \(t\).

Fig. 5
figure 5

Markov assumption and HMM chain relationship representation, where \({S}_{i}\) is the set of latent state variables, and \({X}_{i}\) the set of observable variables 

The stochastic model for the process may be described:

$$\lambda =A,B,\pi$$
(2)

where the model λ is described by the three parameters, A, the state transition matrix, B, the emission matrix, and π, the initial conditions. The emission probability of an observable point is assumed to be a Gaussian distribution conditioned on the current hidden state. The underlying hidden state sequence is assumed time-homogeneous, and the initial conditions are assumed to be equal to 1.

2.2.2 Parameter estimation

Using the Baum-Welch algorithm, a variation of the EM algorithm [62] for estimating the parameters of an HMM, A, B, and π can be determined. The necessary forward and backward probabilities, α and β, are defined with respect to the predicted observation set O and the actual observation set o:

$${\alpha }_{i}\left(t\right)=p\left({O}_{1}={o}_{1},\dots ,{O}_{t}={o}_{t},{Q}_{t}=i\mid \lambda \right)$$
(3)
$${\beta }_{i}\left(t\right)=p\left({O}_{t+1}={o}_{t+1},\dots ,{O}_{T}={o}_{T}\mid {Q}_{t}=i,\lambda \right)$$
(4)

Using the well-established steps of initialization, induction, and termination yields [63]:

$${\alpha }_{i}\left(1\right)={\pi }_{i}{b}_{i}\left({o}_{1}\right)$$
(5)
$${\alpha }_{j}\left(t+1\right)=\left[\sum_{i=1}^{N} {\alpha }_{i}\left(t\right){a}_{ij}\right]{b}_{j}\left({o}_{t+1}\right)$$
(6)
$$P\left(\left.O\right|\lambda \right)=\sum_{j=1}^{N} {\alpha }_{T}\left(j\right)$$
(7)

In this case, as the observations are not discrete symbols, but real values, a Gaussian mixture model was utilized, and the Baum-Welch algorithm then estimates models and normal distribution parameters simultaneously, using

$${b}_{j}\left({o}_{t}\right)=\sum_{m=1}^{M} {c}_{jm}{\mathcal{N}}^{K}\left({o}_{t}\left|{\mu }_{jm},{\sigma }_{jm}\right.\right)$$
(8)

Similarly, for \(\beta\),

$${\beta }_{i}\left(T\right)=1$$
(9)
$${\beta }_{i}\left(t\right)=\sum_{j=1}^{N} {a}_{ij}{b}_{j}\left({o}_{t+1}\right){\beta }_{j}\left(t+1\right)$$
(10)
$$P\left(\left.O\right|\lambda \right)=\sum_{j=1}^{N} {\pi }_{j}{b}_{j}\left({o}_{1}\right){\beta }_{1}\left(j\right)$$
(11)

Parameter \({\gamma }_{i}(t)\), the probability of being in state \(i\) at time \(t\) in terms of \(\alpha\) and \(\beta\), can be defined:

$${\gamma }_{i}\left(t\right)=p\left({Q}_{t}=i\mid O,\lambda \right)=\frac{{\alpha }_{i}(t){\beta }_{i}(t)}{\sum_{j=1}^{N} {\alpha }_{j}(t){\beta }_{j}(t)}$$
(12)

Finally, \({\zeta }_{ij}(t)\), the probability of being in state \(i\) at time \(t\) and being in state \(j\) at time \(t+1\) in terms of \(\gamma ,\alpha ,\beta\), is defined:

$${\zeta }_{ij}\left(t\right)=p\left({Q}_{t}=i,{Q}_{t+1}=j\mid O,\lambda \right)=\frac{{\gamma }_{i}\left(t\right){a}_{ij}{b}_{j}\left({o}_{t+1}\right){\beta }_{j}\left(t+1\right)}{{\beta }_{i}\left(t\right)}$$
(13)

Maximum likelihood is used to determine optimal parameters for \(\lambda\), given the observed data set \(X={x}_{1},{x}_{2},\dots ,{x}_{n}\); in this case, the training data set. The equations to start the iterative estimation of \(\lambda\) parameters can then be obtained:

$$\overline{{\pi }_{i}}={\gamma }_{1}\left(i\right)$$
(14)
$$\overline{{a }_{ij}}=\frac{\sum_{t=1}^{T-1} {\zeta }_{t}\left(i,j\right)}{\sum_{t=1}^{T-1} {\gamma }_{t}\left(i\right)}$$
(15)
$$\overline{{b }_{j}}\left(k\right)=\frac{\sum_{t=1}^{T} {\gamma }_{t}\left(j\right)}{\sum_{t=1}^{T} {\gamma }_{t}\left(j\right)}$$
(16)

which can all be derived by maximizing the auxiliary function over \(\overline{\lambda }\),

$$Q\left(\lambda ,\overline{\lambda }\right)=\sum_{Q} P\left(Q\left|O,\lambda \right.\right)\mathrm{log}\left[P\left(O,Q\left|\overline{\lambda }\right.\right)\right]$$
(17)
$$\underset{\overline{\lambda }}{\mathrm{max}} \left[Q\left(\lambda ,\overline{\lambda }\right)\right]$$
(18)

where (18) will lead to a re-estimated model with increased likelihood, \(P\left(O\left|\overline{\lambda }\right.\right)\ge P(O\mid \lambda )\).

3 Intention training and smoothing

In the intention training process, only the single set of velocity data from the y-axis, a one-dimensional velocity normal to the welding plate, was used as a training feature. This choice was made as expanding to more than one dimension, for example, to 6D, involves running the same algorithm with only the input data changed from one axis to another and saving the results. Therefore, it was considered redundant to run the same algorithm again on different dimensions and include all similar results in this paper. Run time was measured on an Intel Xeon 2.3 GHz CPU.

The initial data set of time vs. welding velocity was split into a training and test set at a 2:1 ratio. A Gaussian distribution was established for the generative probabilistic model behind the HMM, and the iteration number and hidden state number were solved as parameters. A penalization approach was applied to choose the optimal number of hidden states and iterations of training, using the Bayesian information criterion (BIC) [59, 64], which selects the model minimizing the Bayesian posterior probability, defined, \({M}_{\mathrm{BIC}}^{*}\):

$${M}_{\mathrm{BIC}}^{*}=\underset{{M}_{i}}{\mathrm{arg}\,m }\left[-2\mathcal{L}{M}_{i}\left({\theta }_{i}^{*}\right)+{K}_{i}\mathrm{ln}\,N\right]$$
(19)

where the previous model \({M}_{i}\) is modified by \({\overline{\theta }}_{i}\), a parameter vector with a length \({K}_{i}\) and \(N\) observations, \({\theta }_{i}^{*}\), the maximum likelihood estimator of \({\theta }_{i}\) and \(\mathrm{log}\) likelihood \({\mathcal{L}}_{{M}_{i}}\).

The BIC is applied, so increasing complexity of data parameters must be accompanied by improvement in likelihood. This method was applied through a range of 1–20 hidden states and 1–30 iterations to produce Fig. 6. From these results, the optimal hidden state number was identified to be 11, and the iteration number to be 8. These relatively low values were chosen at approximate convergence points, as seen in Fig. 6 to keep calculation times low.

Fig. 6
figure 6

Optimal parameter identification. (a) Identification of the number of hidden states. (b) Identification of the number of training iterations

4 Forecasting parameter identification

4.1 Time delay

The propagation period was determined through a summation of the main likely factors of time delay in the system. The system internal latency identified was 0.48 s, i.e., the real-time delay that arose from the time difference between the UR5 movement command given by the operator and the movement of the robot itself, extracted via the control terminal of the MR system, as shown in Fig. 7. The time difference between the command and the robot position is calculated as 0.48 s. This result is consistent throughout the motion, indicating the robot physical response initially is relatively small compared to this value and for this robot. This result is also consistent with the measurements in the literature of Wang [59] on a similar experimental setup, which is identified as 0.5 s.

Fig. 7
figure 7

Hardware delay δt identification obtained through logged values processed in MATLAB software.

The average time-step can be expressed as 22 ms ± 3.5E − 4, taken from a simple distribution analysis of the raw data. Propagated uncertainty over a forecasting period of 5000 units or 111.7-s peaks at 5000 units at 0.025 ms, as shown in Fig. 8. This time interval is several orders of magnitude less than the average time step of 2.33 ms. Therefore, we can conclude both the use of average δt for propagated time is an appropriate choice, and the uncertainty introduced by this propagation is negligible on the scale of shift from the “true” time in the hypothetical practical welding data set.

Fig. 8
figure 8

Practical visualization of time-step error propagation limits

4.2 Error propagation

HMMs inherently carry larger uncertainty with a smaller sample size [65]. Therefore, a more up-to-date HMM training set would produce more accurate forecast results with regard to the current human welder’s motion, as it would ensure a continuous increase of training data, as well as incorporate changes in intention into the program, depending on the changing environment.

Continuous training is required to minimize errors. Necessary parameters are the length of delay sufficient to cover the practical \(\delta {t}_{i}\) between communication segments, as in Fig. 1, and start calculations a length \(\delta {t}_{\text{calc}}\) away from the tail end of the communication segment. These calculations are made at a time equal to \({t}_{\mathrm{seg}}-\delta {t}_{\mathrm{calc}}\) into the segment, where \({t}_{\mathrm{seg}}\) is identifiable through the resolution of command packets sent by the system. Assuming the moment delay occurs is unidentifiable due to the unpredictable nature of the operating environment, this approach yields:

$$\delta {t}_{\text{calc }}={\delta }_{\text{sample }}+{\delta }_{\text{train}}$$
(20)
$$\delta {t}_{i}={\delta }_{\text{remote }}+{\delta }_{\text{command }}+\delta {t}_{\text{calc}}$$
(21)

where \({\delta }_{\text{sample}}\) and \({\delta }_{\text{train}}\) are as previously defined, \({\delta }_{\text{remote}}\) is the predicted delay due to remote operation taken as 2 s based on limitations from the literature [66], \({\delta }_{\text{command}}\) is the internal latency of 0.48 s and \(\delta {t}_{i},\) thus, representing the forecasting length required by the model.

Therefore, the recommended forecasting length is every 2.8 s or 150 cycles, which may be interpreted as the minimum feasible propagation length and also includes retraining. This value has been used for Fig. 9 c, d. Note, both are, together, one minimum value and may be adjusted situationally based on specifications or operational environment.

Fig. 9
figure 9figure 9

Raw data and processed results. (a) Raw human welding data. (b) Raw human welding data (red) vs. smoothed data (blue). (c) Raw human welding data (red) plus forecasted segment over 300 units or 6.6 s (blue). (d) Concatenated data set of human welding data with forecasted segment (red) vs. smoothed data set (blue)

5 Welding motion prediction

5.1 Results of forecasting

Figures 9 a and b show the raw data before and after processing through the assistive operation algorithm. Data was forecast for a period of 150 cycles, or 2.8 s, based on the calculated recommendation, shown in Fig. 9 c. Finally, the smoothing technique was applied to the merged data set containing both raw and forecast data. From Fig. 9 d, it is clear that the border between the two data sets fully processed with smoothing, as in the blue line, is practically indistinguishable, and that smoothing helps eliminate jitter and noise from unsteady operators, or from electrical interference. It also helps greatly in integrating the forecast segment and enforcing a continuous command input stream. The generative aspect of the model is also clearly shown. It is not difficult to imagine it may be used to propagate the model much further, although constrained by error propagation issues. The hidden state identification graph is shown in Fig. 10.

Fig. 10
figure 10

Hidden state identification graph. The hidden state variables on the y-axis are sequential integer values representing the welder’s intention as state variables

5.2 Validation of system design

Accuracy is the main performance indicator of the algorithm. The effect of increasing forecast time was analyzed against root-mean-square error (RMSE), as shown in Fig. 11. As forecast length increases, RMSE also increases, as expected, indicating lower homogeneity with the raw data, and supporting a higher retraining frequency. The HMM-based model satisfies both homogeneity and frequency requirements compared to other algorithms. The RMSE comparison between different prediction lengths is shown in Table 1.

Fig. 11
figure 11

Percentage of test set data used for RMSE calculation against training set as a propagation ratio (bottom x-axis) and length of the propagation time (top x-axis) vs RMSE results

Table 1 RMSE comparison between different prediction lengths

The propagation ratio is defined as the ratio of the predicted data set length to the full test data set to which it was compared. A propagation ratio of 10% represents 10% of the length of the test set. The lower bound of the propagation ratio length, at 11.2%, is represented by the blue line in Fig. 11. The lower bound limits the minimum propagation length of the algorithm by the calculated forecasting length of 2.8 s.

The propagation ratio is dynamically bound by a maximum as the RMSE continues to increase. The slope of the RMSE appears to partially stabilize at 40% propagation. However, in practical implementation, weld imperfections and difficult welding profiles will potentially introduce more noise and factors into the raw data.

From the graph, the RMSE minima at 5% and 0.866 m/s are almost half the maxima at 100% and 1.44 m/s. The minimum practical propagation ratio of 11.2% and 0.892 m/s has a score of 38% lower than the maxima. The low RMSE score implies a better fit with the test data and a forecast set more accurate in predicting welder movements.

The raw data (red) in Fig. 9 c, with the artificial interruption causing a delay of 2.8 s, shows a potential implementation case of autonomous control during a period of unpredictable delay. Continuous delay operation can be seen in Fig. 12, where the blue raw data curve and forecast scatter plot show similar human motion results, as compared to the simulated delay control curve in red.

Fig. 12
figure 12

Implementation of HMM predictive model (blue scatter) vs. raw data (blue curve) and control curve without HMM application (with delay; red)

The forecast points from Fig. 12 were then experimentally applied. A welding experiment was conducted to predict welder movement intention and compare that against the welding motion measurements. The result indicates the HMM approach can produce smooth and continuous movement in the welding direction. The predicted portion of the weld is consistent with the portion based on the physical welder motion measurements. Acceptable weld surface continuity and quality bead shape are observed with the HMM prediction approach.

The RMSE comparison between the control curve without HMM and with HMM is shown in Table 2.

Table 2 RMSE comparison between control curve without HMM and with HMM

The RMSE of the raw data and the delay curve with smoothing, but no prediction, was 0.527 m/s, while that of the raw data and the HMM-applied forecast data was 0.177 m/s. Therefore, taking the fractional decrease from 0.527 to 0.177 m/s, we conclude that the application of the HMM forecasting model resulted in an RMSE reduction of 66%, implying higher accuracy and practical benefits of the algorithm.

The welding results of tele-welding with and without the HMM-assisted integration of raw and predicted motion with smoothing are shown in Fig. 13. For tele-welding without HMM application, the weld seam has waves in the direction perpendicular to the movement of the welding torch in the welding plane. For welding with HMM-assisted integration of raw and predicted motion with smoothing, the movement of the torch along the weld seam is adaptively regulated by the HMM, resulting in a smooth and consistent weld seam.

Fig. 13
figure 13

Comparison of welding results. (a) Welding result for robot-assisted welding without the HMM model and (b) welding result for robot-assisted welding with HMM-assisted integration of raw and predicted motion with smoothing

6 Discussion

HMM principles were applied to find optimal parameters to estimate and predict welder intention and velocity data. The HMM model is particularly suitable for two reasons. First, as the assumed hidden state of human welder intention is inherently not directly observable, the availability of the Baum-Welch algorithm to calculate initialization values is very convenient. Second is the fact that the model may be sampled in real time without requiring a distinct input. The results show training times and latency are related and test subject specific. Training was performed 1000 times to ensure convergence of the model [59]. After training, the model is ready for smoothing and prediction and the latency is only dependent on this final model and how much it was trained. For multi-user scenarios, where operators may have differing distributions of motion data, an operator-specific model will work best. In this case, it is recommended to train a model for each user to learn and abstract the user’s motion and control characteristics, which may require further time and model preparation.

In contrast, other algorithms such as LSTM architectures, a type of recurrent neural network, would slow down operation time and compound delay and may not be applicable in real time, which is against the idea of the research [67]. Additionally, as HMMs operate under Markov and Gaussian belief assumptions, the memory required will not exceed limits imposed by installation in a smaller, portable system for mobile operation. The objective of this study was to evaluate the differences between with HMM predictive model and without HMM application while performing the tele-welding tasks. We have added comparisons with and without HMM application and analyzed how the unwanted noise and jitter altered between different modes. The use of alternative machine learning algorithms and comparison with the HMM model will be studied and experimented with in the following papers to obtain an optimized welding performance. The use of alternative machine learning algorithms and comparison with the HMM model will be studied and experimented with in the following papers to obtain an optimized welding performance, as we are also cognizant of the paper length.

Implications include a well-trained system with sufficient data that will accurately forecast the output velocity set for welding, so even a 2.8-s delay is theoretically eliminated. Rather than addressing the source of the delay, impossible to eliminate with current technology and communications infrastructure, a generative system is utilized, which is able to approach delay as a lumped system, regardless of its source. Therefore, we can use an HMM as a general-case, low-cost solution instead of improving each component separately to reduce latency. This latter approach uses HMMs and computation to circumvent increasing hardware requirements and costs.

The main limitation of this HMM approach is accuracy. However, results (Fig. 12) show the implementation of the algorithm, compared to the control curve without HMM, already has a much better fit to the data, with an RMSE reduction of 66%. The elimination of the delay would achieve the goal of the system, which is to perform real-time welding as if the operator were directly controlling the welding torch. Without the trained model, there would be no advantage over an automatic welding system, as there would be no skilled human feedback. Consideration of other feedback sources in the system, such as the delay created by the video feedback between the MR headset and computer, also increases the practical validity of the system. Future work may explore the identification of individual delay sources, their statistical properties, and consequent optimization of the proposed algorithm.

Accuracy may be optimized by providing a large input data set, which is feasible in this application as the velocity measurement data is not difficult to obtain. Alternatively, the incorporation of additional factors for Bayesian estimation may raise accuracy. The algorithm itself may also be developed further, where more complex Bayesian inference methods, such as the Markov chain Monte Carlo (MCMC) approximation, can yield better stability and accuracy [68] than the overall maximization of likelihood as used in the Baum-Welch algorithm. In more delicate teleoperation settings, this alternative approach could be a point of focus. The effects of adjusting the re-training frequency could also be explored.

Considering other feedback sources in this system, Smoothing is performed via EWMA as opposed to HMM, as the HMM algorithm is kept separate for motion prediction for simplicity and system architecture compartmentalization [69]. As in other works [59], it could be used to also process smoothing/assistive welding using human intention recognition. However, in this system, it is assumed that the intention of the welder is to maintain a steady weld as a priori knowledge. Without this assumption, the HMM algorithm may, indeed, justifiably be applied to both problems.

The incorporation of the cloud network, not covered in-depth in this paper, makes the system easy to implement, as all processing may be performed using cloud computing [64]. Additionally, this system is kept low cost, as it does not require expensive, dedicated equipment or communication lines. Together, these elements minimize set-up overhead and save time, as well as reduce investment in computational hardware [70].

Practical performance evaluation of the system with welding experiments, especially at extreme distances or conditions, is a potential area for future development. The cost-effective nature of improving software, rather than hardware, may also benefit similar systems and may be explored; for example, pick-and-place systems or other teleoperation applications.

7 Conclusions

This paper proposed a structure for the design and implementation of an HMM algorithm for application in semi-autonomous cooperative control for a telerobotic welding system. We presented the underlying formulae and assumptions governing the algorithm, as well as implementing it for proof of practicality. We focused on time delay as the main issue of the system, identified the minimum system time delay, and calculated the ideal projection period for smooth operation. Finally, we developed the algorithm utilizing the developed forecasting algorithm to enable assisted welding in the form of real-time smoothing, which is, in turn, facilitated by the elimination of time delay through predictive data generation.