A multi-resolution physics-informed recurrent neural network: formulation and application to musculoskeletal systems

This work presents a multi-resolution physics-informed recurrent neural network (MR PI-RNN), for simultaneous prediction of musculoskeletal (MSK) motion and parameter identification of the MSK systems. The MSK application was selected as the model problem due to its challenging nature in mapping the high-frequency surface electromyography (sEMG) signals to the low-frequency body joint motion controlled by the MSK and muscle contraction dynamics. The proposed method utilizes the fast wavelet transform to decompose the mixed frequency input sEMG and output joint motion signals into nested multi-resolution signals. The prediction model is subsequently trained on coarser-scale input–output signals using a gated recurrent unit (GRU), and then the trained parameters are transferred to the next level of training with finer-scale signals. These training processes are repeated recursively under a transfer-learning fashion until the full-scale training (i.e., with unfiltered signals) is achieved, while satisfying the underlying dynamic equilibrium. Numerical examples on recorded subject data demonstrate the effectiveness of the proposed framework in generating a physics-informed forward-dynamics surrogate, which yields higher accuracy in motion predictions of elbow flexion–extension of an MSK system compared to the case with single-scale training. The framework is also capable of identifying muscle parameters that are physiologically consistent with the subject’s kinematics data.


Introduction
The prediction of the evolution of state variables in dynamical systems has been a vital component to several scientific applications such as biology, geophysics, earthquake engineering, solid mechanics, robotics, computer vision [1][2][3][4][5][6][7] etc. Black-box techniques based on data-driven mapping and development of parameterized multi-physics models describing the progression of the data have been previously utilized for making predictions on the states.This task continues to be an active area of research due to challenges on many fronts, such as, the quality and scarcity of relevant physical data, the dynamics and complexity of the system, and the reliability and accuracy of the prediction model.
On the other hand, the characterization of parameters in the multi-physics models of these dynamical systems is also critical [8][9][10][11][12][13][14].The task is challenging due in parts to potential noise pollution captured by sensors in the system's measured data, as well as the potential of the parameter space being high-dimensional, leading to ill-posed problems that pose difficulties in numerical solutions.Standard optimization techniques such as genetic algorithms [15,16], simulated annealing [17], and non-linear least squares [18,19] have been employed for parameter identification, but can be computationally expensive and may not converge for ill-posed, nonconvex optimization problems that are encountered while solving inverse problems on MSK systems [15,20].
In this study, we focus on the application to musculoskeletal systems, where we aim to utilize noninvasive muscle activity measurements such as surface electromyography (sEMG) signals to predict joint kinetics or kinematics [1,18,19], e.g., for health assessment and rehabilitation purposes [15,16].These sEMG signals can be used as control inputs to drive the physiological subsystems, which are governed by parameterized non-linear differential equations, that form the forward dynamics problem.Hence, given information on muscle activations, the joint motion of a subject-specific MSK system can be obtained by solving a forward dynamics problem.Data-driven approaches for motion prediction have also been introduced to directly map the input sEMG signal to joint kinetics/kinematics, bypassing the forward dynamics equations and the need for parameter estimation [26][27][28][29][30].However, the resulting ML-based surrogate models lack interpretability and may not satisfy the underlying physics.Another challenge is that the sEMG signal usually exhibits a wide range of frequencies that are non-trivial for ML models [1] to map to the joint motion.
In our previous work [1], a physics-informed parameter identification neural network (PI-PINN) was proposed for the simultaneous prediction of motion and parameter identification with application to MSK systems.Using the raw transient sEMG signals obtained from the sensors and the corresponding joint motion data, the PI-PINN learned to predict the motion and identify the parameters of the hill-type muscle models representing the contractile muscle-tendon complex.A feature-encoded approach was introduced to enhance the training of the PI-PINN, which yielded high motion prediction accuracy and identified system parameters within a physiological range, with only a limited number of training samples.However, this method relies on mapping in a feature domain constituted by Fourier and polynomial bases, which requires the input sEMG signal to span over the entire duration of the motion.Thus, it prevents real-time predictions as the signal is obtained from the sensor.
To enhance the predictive accuracy of the time-dependent signals, recurrent neural networks (RNNs) such as gated recurrent units (GRUs) [29,53,54] are utilized in this study to inform predictions with the history information of the motion.To overcome the limitation of the size of the data and provide more information from the composite frequency bands in the signals, a multiresolution based (MR) approach is proposed.Wavelets are used to decompose the raw sEMG and joint motion signals into coarse-scale components at various frequency scales and the remaining fine-scale details.Using principles of multi-resolution theory and transfer learning, multiresolution training processes are repeated recursively from coarse-scale to the full-scale to map the sEMG to the joint motion.To enhance the robustness and generalizability of the model, gaussian noise is introduced to the recorded motion data used for training [29].The trained model can be applied for real-time motion predictions given the raw sEMG signal obtained from the sensor.This manuscript is organized as follows.Section 2 introduces the subsystems and mathematical formulations of MSK forward dynamics, followed by an introduction of the proposed multiresolution PI-RNN framework for simultaneous motion prediction and system parameter identification in Section 3. The following sections verify the proposed framework using synthetic data and validate it by modeling the elbow flexion-extension movement using subject-specific sEMG signals and recorded motion data in Section 4 and 5, respectively.Concluding remarks and future work are summarized in Section 6.

Dynamics
This section provides a brief overview of muscle mechanics and forward dynamics of the human MSK system, with details in Appendix A and B. As depicted in Fig. 1, multiple subsystems within the MSK forward dynamics interact hierarchically: 1) the neural excitation () transforms into muscle activation () (activation dynamics); 2) Muscle activation drives muscle fibers to produce force   (muscle-tendon (MT) contraction dynamics); 3) the resultant forces produce joint motion q (translation and rotation) of MSK systems, called the MSK forward dynamics [11,12,37].

Fig. 1:
The subsystems involved in the forward dynamics of an MSK system are depicted in this flowchart.Neural excitations are transmitted to muscle fibers (activation dynamics) that contract to produce force (muscle-tendon contraction dynamics).These forces generate torques at the joints (structural level MSK dynamics) leading to joint motion [1,55].

Neural Excitation-to-Activation Dynamics
While activations () in the muscle fibers can be obtained through a non-linear transformation on neural excitations (), they are difficult to measure in-vivo.Therefore, the excitations are estimated from [15,16] the raw sEMG signals () considering an electro-mechanical delay: where  measures the delay between the neural excitation originating and reaching the muscle group.The muscle activation signal () is then expressed as, where  is a shape factor.These activations initiate muscle fiber contraction leading to force production from the muscle group.Forces in the muscle-tendon (MT) complex are generated by the dynamics of MT contractions, where for structural length scale behaviour of the MT complex, homogenized hill-type muscle models are utilized (described in Appendix B) .Each muscle group can be characterized by a parameter vector,
where  is the activation function in Eq. ( 2),  ̃ is the normalized muscle length,  ̃ is the normalized velocity of the muscle.In this study, the tendon is assumed to be rigid (  =    ) which simplifies the MT contraction dynamics [58,59] accounting for the interaction of the activation, force length, and force velocity properties of the MT complex.More details can be found in Appendices A and B.

MSK Forward Dynamics of Motion
Body movement is the result of the force produced by actuators (MT complexes), converted to torques at the joints of the body, leading to rotation and translation of joints, which are considered as the generalized degrees of freedom of an MSK system ().The dynamic equilibrium can be expressed as where , ̇, ̈ are the vectors of generalized angular motions, angular velocities, and angular accelerations, respectively; () is the torque from the external forces acting on the MSK system, e.g., ground reactions, gravitational loads etc.; () is the inertial matrix;   is the torque from all muscles in the model calculated by   (, , ̇; ) = ()  (, , ̇; ), where () are the moment arm's and   (, , ̇; ) are the forces from the MT complex.Given the muscle activation signals  , initial conditions and parameters of involved muscle groups  , the generalized angular motions  and angular velocities ̇ of the joints can be obtained by solving Eq. ( 5).An example of these vectors is shown in Section 4 and Appendix D.

Multi-Resolution Recurrent Neural Networks for Physics-Informed Parameter Identification
This section describes the recurrent neural network algorithms, followed by the physics-informed parameter identification that enables the development of a forward dynamics surrogate and simultaneous parameter identification.The employment of multi-resolution analysis based on fast wavelet transform [60,61] for training data augmentation is then defined.The computational framework for multi-resolution recurrent neural network for physics-informed parameter identification is also discussed.The computational graph of a standard recurrent neural network (RNN) and its unfolded graph is shown in Fig. 3.The hidden state  allows for RNNs to learn important history-dependent features from the data in sequential time steps [29,53,54].The unfolded graph shows the sharing of parameters across the architecture of the network, allowing for efficient training.The forward propagation of an RNN starts with an initial hidden state that embeds history-dependent features and propagates through all input steps.Considering an RNN with  history steps as shown in Fig. 3, the propagation of the hidden state can be expressed as follows [29].

Recurrent Neural Networks and Gated Recurrent Units
The hidden state at the final (current) step  is then used to inform the prediction.
̂ =  ℎ   +   (7) Here,  ℎ is the hyperbolic tangent function;  ℎ ,  ℎℎ , and  ℎ are the trainable weight coefficients;  ℎ and   are the trainable bias coefficients.The trainable parameters are shared across all RNN steps.Let   = [  ,   1 , … ,     ] be the current time and current sEMG data of the   muscle components and  ̂ be the predicted joint motions at the current time   .Fig. 4(a) illustrates the computational graph of an RNN model trained to predict the motion at step  by using m history steps of  and  as well as the  at step .The forward propagation is defined as ̂ =  ℎ ̂ +   (10) with trainable parameters including the weight coefficients  ℎℎ ,  ℎ ,  ℎ and  ℎ ̂ and bias coefficients  ℎ and   .During training, the 'teacher-forcing' method is used where the measured motion data is given to the model in the history steps.In test mode, the model is fed back to the previous predictions as input to inform future predictions.The inputs received in this scenario could be quite different from those passed through in the training process, leading the network to make extrapolative predictions and therefore, accumulate errors which will pollute the predictions.
To improve the testing performance and enhance model accuracy and robustness, a user-controlled amount of random Gaussian noise is added to the recorded motion data to introduce stochasticity so that the network can learn variable input conditions, resembling those in the test mode, see [29] for details.Standard RNNs, however, have difficulties in learning long-term dependencies due to vanishing and exploding gradient issues arising from the recurrent connections.To mitigate these issues, gated recurrent units (GRUs) have been developed [48,50].A standard GRU consists of a reset gate   , that removes irrelevant history information, an update gate   that controls the amount of history information that is passed to the next step, and a candidate hidden state  ̃ that is used to calculate the current hidden state   [53,62].Considering a GRU with  history steps, the forward propagation can be expressed as follows [29]: ̂ =  ℎ ̂ +   (13) where  [21].Training occurs by plugging in the measured motion data in history steps (shown in Fig. 5), known as the teacher forcing procedure [21].For predictions, the prediction from the previous step is used to predict the current step.The addition of gaussian noise to measured data, as described before, is adopted in GRU models as well.(11)) and, (b) where the hidden state  −1 is plugged back in to the GRU along with input   at step  to predict the motion  ̂ (Eq.( 12)-( 13)).The '+' cell produces an output (arrow pointing outwards) that is the summation of the inputs (arrows pointing into the cell).

Simultaneous Forward Dynamics Learning and Parameter Identification
With the governing equations for a general MSK forward dynamics (Section 2.1), the following parameterized ODE system is defined as where the differential operator [(⋅); ] is parameterized by a set of parameters .The right-hand side (; ) is parameterised by .[(⋅)] is the operator for initial conditions, and  is the vector of prescribed initial conditions.To simplify notations, the ODE parameters are denoted by  = {, }.The solution to the ODE system : [0, ] → ℝ depends on the choice of parameters . Here where   denotes RNN evaluations (depending on model chosen) discussed in Eq. ( 11)-( 13).
The optimal RNN parameters  ̃ and the ODE parameters  ̃ are obtained by minimizing the composite loss function  as follows, where  is the parameter to regularize the loss contribution from the ODE residual term in the loss function and can be estimated analytically [1].The data loss is defined by, where  ̂() is the predicted motion, and   is the recorded motion of MSK joints.In addition to training an MSK forward dynamics surrogate, the proposed framework aims to simultaneously identify important MSK parameters from the training data by minimizing residual of the governing equation of MSK system dynamics in Eq. (5).
where ( ̂(); ) is the residual associated with Eq. ( 14) for the  ℎ sample;  = {, } represents the ODE parameters relevant to the MSK system.The gradients of the network outputs with respect to the network parameters (), MSK parameters (), and inputs are needed in the loss function minimization in Eq. ( 16), which can be obtained efficiently by automatic differentiation [63].The formulation in Eq. ( 15) is general such that more advanced RNN frameworks can be used such as the GRU described in Eq. ( 11)-( 13).

Multi-Resolution Training with Transfer Learning
To improve the training efficiency of RNN for MSK applications with mixed-frequency sEMG input signals and low-frequency output joint motion, a multi-resolution decomposition of the training input-output data is introduced in Section 3.3.1,followed by the transfer learning based multi-resolution training protocols to be discussed in Section 3.3.2.

Wavelet based Multi-Resolution Analysis
Consider a sequence of nested subspaces A mutually orthogonal complement of   in  +1 is   , such that, where ⊕ is a direct sum.This subspace   is spanned by a set of wavelet functions  , (), i.e., where () is the mother wavelet.It follows that, and therefore, The two-scale dilation and translation relations for the scaling functions can be written as Orthogonal wavelet functions can be obtained by imposing orthogonality conditions between scaling and wavelet functions in the frequency domain using Fourier transform, where   is the coefficient.
Orthogonal scaling functions can be constructed by choosing a candidate function  * () such that  * () have reasonable decay and a finite support.In addition, ∫  * () ≠ 0. It should also satisfy the two-scale relation, With these, an orthogonal scaling function () can be expressed in terms of  * () as It is then possible to define the scaling function at the coarse scale in terms of the scaling function at the fine scale and the wavelet functions at the coarser scale, Any function can be approximated at scale [] by using  , as a basis as well as using its coarse scale [ − 1] representation and details at the coarse scale, i.e., where   and   are the operators projecting  onto the subspaces   and details of  at scale [] in the orthogonal subspace   , respectively.  [] and   [] are the corresponding basis coefficients at the coarse scale [].While the example shown here is for a one-dimensional case, this multiresolution representation can be extended to multi-dimensions.

Multi-Resolution Data Representation and Training Protocols
In this approach, a given signal () is represented using the multi-resolution scaling functions and wavelets.A scale [] representation of signal () can be obtained from the scale [] ( > ) representation with the addition of wavelet components (high frequency components) of the scales higher than [], using the discrete wavelet transform modified from Eq. ( 27), where   is the projection operator at scale [] and   are the wavelet projectors of the signal that are added from scale [] to scale [ − 1] to reconstruct the signal at scale [];   [] and   [] are the scaling and wavelet function's coefficients, obtained by the orthogonality condition as given in Here we consider a general MSK system described in Section 2. The original unfiltered data is denoted as scale [0], which will be decomposed into a sequence of lower scales [−],  ∈ ℤ + for multi-resolution training.
Let  [0] be the input training data at the full-scale ( = 0) of the raw signals i.e., ], [0] = [  ,    1[0] , … ,     [0] ]. ( and the motion of joints of the MSK system at the  ℎ time-step at the full-scale such that the array of the unfiltered motion data for the duration of the motion is ].
From MR theory, subtracting details from the fine scale representations at the full-scale of the signal, i.e., [0], results in a course scale representation of the signal at scale [−],  = 1, … .The projected training data at coarse scale [-j] is defined as where   is the total number of data points and , are obtained from the original raw data   [0] and   [0] by wavelet projection using Eq. ( 27), that is, [−] () ≡    [0] () =  −1  [0] () +  −1  [0] () =  [0] () − ∑    [0] () where   and   are the projection operators in multi-dimensions.Hence, datasets that contain lower resolution representations of the original signal at scales [0] can be expressed as: where Instead of learning the signal mapping from input original raw sEMG data  [0] to motion data  [0] , we initiate learning the mapping by starting from a coarse scale representation of the input-output data at scale [−] and map  [−] to  [−] .For multi-resolution RNN, the initial learning starts from the coarsest scale [−] as follows: At the next finer scale [− + 1], the weights at scale [−] (using an early stopping [64]) are used as the initial values for  ℎℎ , similar to the concept of transfer learning [65].
Similarly, for multi-resolution GRU, the initial learning starts from the coarsest scale [−] as described in Appendix C. The same procedures to transfer the NN parameters in Eq.( 34)-( 36  radians and ̇(0) = 0 radians/ sec and the parameters in Table 1, the motion of the elbow joint, , can be obtained by solving the MSK forward dynamics problem using a synthetic solver.2. As the maximum value of the noiseless sEMG signals is 1, the chosen 's were kept within 10% -20% of the signal maximum for a reasonable level of noise.Then the corresponding output motions are generated by passing the noisy sEMG as input to the FD equations in Section 2. The following training procedures were performed for each of the three cases.

1-scale Training
The mixed frequency input sEMG signals and corresponding output motion data  at scale [0], denoted by  [0] and  [0] , respectively, are mapped to get a baseline performance.This is termed as 1-scale training as only the full-scale (i.e., [0]) of the mixed frequency data is used for training.

2-scale Training
a. Initiate learning from a coarse scale representation of the mixed frequency input data at scale [−1] and map  [−1] to the corresponding motion data at scale [-1],  [−1] , of that case.
b. Transfer parameters to the next scale training and finish the learning by mapping  [0] to  [0] .

3-scale Training
a. Start learning from a coarse scale representation of the mixed frequency input data at scale [−2] and map  [−2] to the corresponding motion data at scale [-2] ,  [−2] , of that case.b.Transfer parameters to the next scale training and continue learning by mapping  [−1] to c. Transfer parameters to the next scale training and finish the learning by mapping  [0] to  [0] .Table 2: Input data and gaussian noise level for each case.

Case ID
Input Synthetic sEMG data + (, ) Original +(0,0.15) 3 Original +(0,0.2) For each case and for each of the training scales in that case, the training data samples contained the data of trial's 1, 2, 4, 5 while trial 3 was used for testing, each trial with  =500 data points.
The MSK parameters  = {  } =1 4 = { 0,  ,  0,  ,  0,  ,  0,  } were chosen to be identified from the training data using the proposed framework.Due to differences in units and physiological nature of the parameters, the conditioning of the parameter identification system could be affected.To mitigate this issue, normalization [1,44] was applied to each of the parameters, where   (0) was the initial value of the parameter.Therefore, the parameters to be identified became  ̅ = { ̅  } =1 4 .
The proposed framework, as described in Section 3, was applied to each case to simultaneously learn the MSK forward dynamics surrogate and identify the MSK parameters  ̅ by optimizing Eq. ( 16), where the residual of the governing equation for the current time step , was expressed as and is included in the residual term   in the loss function in Eq. ( 16).While the training happens sequentially from coarse to fine-scales of the motion, the final identification of parameters happens at the scale [0], i.e., the full-scale in each of the 1-, 2-and 3-scale MR training types.
A GRU with 2 history steps, 1 hidden layer and 50 neurons in each layer was used.The training was performed using the Adam algorithm [66] with an initial learning rate of 1 × 10 −3 and the penalty parameter for the MSK residual term in the loss function,  ∝ Δ 2  = 10 −3 .Δ is the timestep between data points and  is the moment of inertia in Eq. (38).Multiple parameter initialization seeds were used for an averaged response of the MR training.
To compare the post-training performance of 1-, 2-and 3-scale MR training's, the average testing mean squared error (MSE) and testing R 2 scores were compared, where these measures for a single trial are defined as: where  is the motion data of the trial,  ̂ is the trial's predicted motion from the MR PI-RNN framework, and  ̅ is the mean of trial's motion data with  being the number of data points in the Together, this reduction in bias and growth in variance leads to a better generalization performance.
Computationally, this method improves accuracy in the same amount of training epochs showing the efficiency of this method.As generalization predictions post-training are made using the fullscale of the data, there is no increase in time needed to perform the forward pass for any scale.
Meanwhile, the MSK parameters,  0  (maximum isometric force) and  0  (optimal muscle length corresponding to the maximum isometric force), of both the biceps and the triceps were accurately identified from the motion data, as shown in Table 3.Compared with the parameter identification from our previous work [1] where in addition to  0  , the maximum contraction velocity   For the identification of optimal muscle length parameters ( 0  ), the initial points need to be chosen with respect to constraints applied by the geometry of the MSK system.The errors reported in

Application of MR PI-RNN method to Subject-Specific Data
The recorded motion data and sEMG signals were collected and processed as per the data acquisition protocols mentioned in [1].Three elbow flexion-extension motion trials were performed by the subject, with the sEMG sensors placed on the biceps and triceps muscle groups.
The processed sEMG signals were transformed as described in Section 2.  In this example, the raw sEMG signals were used as input.A 5-scale MR training procedure as described in Section 4 was used on a GRU with 1 hidden layer with 50 neurons.The data of trials 1 and 3 were used for training, while trial 2 was used for testing, where each signal contained 500 temporal data points.

Parameter Identification
The muscle parameters to be identified by the framework include the maximum isometric force and the optimal muscle length from both muscle groups, which are denoted as  = { 0,  ,  0,  ,  0,  ,  0,  } .It was observed in our tests that despite the normalization process described in Eq. ( 37) and (38), the parameters obtained at the end of the MR training with motion data either diverged or converged to non-physiological values.To obtain physiologically consistent parameters, we use the values obtained from literature studies and constrain the space of parameter search [44].
Let the parameter to be identified be defined as where ̅  is the value defined in the  ℎ literature study and   is the parameter to be optimized in the training such that it can be used to evaluate the sigmoid function sig(  ) and  is the vector of these trainable parameters.Using the optimized  , the desired MSK parameters can be estimated.This formulation constrains the identified parameters to be consistent with parameters obtained through experimental studies [68][69][70].
The proposed framework was then applied to simultaneously learn the MSK forward dynamics surrogate and identify the MSK parameters  by optimizing Eq. ( 16), where the residual of the governing equation for the current time step , was expressed as As mentioned in the verification example (Section 4), the final parameter identification happens at the full-scale, i.e., at [0].

Results
The training was performed using the Adam algorithm [66] with an initial learning rate of 1 × 10 −3 and 4 history steps were considered.Multiple parameter initialization seeds were used for an averaged response of the MR training.To quantify the error in the testing predictions, a normalized mean squared error was defined, where   is the  ℎ target motion data point,   ̂ is the  ℎ predicted motion data point, from the MR PI-RNN framework, and  ̅ is the mean of target motion data.The R 2 score was calculated using the metric defined in Eq. (40).From Fig. 10, Fig. 11 and

Discussion and Conclusions
In this work, we proposed a multi-resolution physics-informed recurrent neural network (MR PI- Computationally, it is noted that the proposed method achieves improved accuracy by using the same amount of training epochs.
The proposed MR framework was validated on recorded sEMG and motion data from a subject [1] and significant improvements were observed in the testing prediction accuracy, with 1-scale training often leading to large errors.The predicted motion at higher training scales showed improvements across all initialization points used, indicating the robustness of the method.The identified parameters were also consistent with the physiological range observed in literature.
This method also has the advantage of operating in the time-domain as compared to the featureencoded (FE) training [1], where the input sEMG signals were projected on to the frequency domain using the Fourier basis.In the FE training, to make a prediction, the input signal for the entire duration of the movement prediction was needed whereas the physics informed MR training of the RNN enables the trained model to make real-time predictions by using the information of the previous time-steps and the current sEMG signal.In addition, for mixed frequency signals, wavelet resolution can better capture the local frequency information as compared to the Fourier basis which captures the global frequency information.
This method is presented as a general case where multi-resolution is applied to both input and output.For some applications, for e.g., those that require only data mapping, the MR training can be applied by only considering the decomposition to the input, keeping the output at the full-scale (i.e., scale [0]) throughout, or vice versa.To further improve this method, we can consider the use of multi-resolution as activation functions of the ML framework, instead of relying on data filtration processes for better computational efficiency.This method will also be tested on other physics-informed ML techniques to solve forward problems with PDEs having mixed frequency source terms.

Acknowledgment
The support of this work by the National Institute of Health under grant number 5R01AG056999-04 to K. Taneja & J. S. Chen are very much appreciated.

ForceFig. 2 :
Fig. 2: A muscle-tendon complex in the arm modelled by a homogenized hill-type model where muscle group's in (a) are a homogenized muscle-tendon (MT) complex described by the model shown in (b).

Fig. 3 :
Fig. 3: Computational graph of a standard recurrent neural network using 'm' history steps for prediction.

Fig. 4 :
An example computational graph of an RNN that uses one history step: (a) The train mode and (b) the test mode, where the motion predicted from the previous step is used as part of the input to predict motion at the current step.

Fig. 5 :
Fig. 5: An example computational graph of a GRU in train mode that uses one history step: (a) Starting with an initial or previously obtained hidden state ( −2 ), the main GRU cell takes the input  −1 and motion  −1 that are used to obtain the GRU hidden state  −1 at step  − 1 (Eq.

Section 3 . 3 . 1 .
Using the Wavelet transform to represent a time series under multiple resolutions offers advantages for feature extraction from signals.Compared to the Fourier transform which offers only localization in the frequency domain, the Wavelet transform provides both frequency and time domain localization, making it more suitable for time history (or sequence) learning algorithms such as the standard RNN and its enhanced variant GRU.More specifically, one can enhance training efficiency by using a sequential training strategy for the time-history input (sEMG) and output (joint motion) data.Applying the Fast Wavelet Transform [57,58] to obtain the input and output data from low to high resolutions results in better generalization performance of the RNN trained to map from sEMG signals to joint motion time history as described below.The second order Daubechies wavelets are used in this work.
is the input data of scale [-j] at time step .The motion of the MSK joints at the  ℎ time-step at the scale [−] is   [−] .The data sets for a representative muscle group '',   [−] and motion   [−] ) are repeated with [−] → [− + 1] until it reaches scale [0].To enhance model accuracy and robustness, variations based on Gaussian noise are added to the motion data in each sequential step, as suggested by [29].The sequential MR training process is described in Algorithm 1.

Fig. 6 :
Fig. 6: An overview of the application of this framework to the recorded motion data.The location of motion capture markers is circled in red and the sEMG sensors on Biceps and Triceps muscle groups in blue and green, respectively.The simplified rigid body model was used in the forward dynamics equations within the framework with appropriately scaled anthropometric properties (for geometry) and physiological parameters (for muscle-tendon material models).The raw sEMG signals were mapped to the target angular motion of the elbow and used to simultaneously characterize the MSK system using the proposed Multi-Resolution PI-RNN framework.

Fig. 7 :
Fig. 7: The original 'noiseless' input data set with the synthetic biceps and triceps sEMG signals having variations in frequency for five trials are shown at the top.Increasing levels of noise are added to develop 3 cases of synthetic mixed frequency input sEMG, from which corresponding output motions are solved, using the forward dynamics equations.To verify the MR framework, these three cases with their respective mixed frequency input data are then mapped to their corresponding motion data.
̂k [−] (  ) −  ( ̂ [−] (  )) −   (  (  ),   (  ),  ̂ [−] (  ),  ̂k [−] (  ); ( ̅ ;  (0) )) trial.At each epoch in the MR training, the training loss is calculated by using the scale of the training data used in that training scale, i.e., scale [−] of the data is used in -scale training.The gradual improvement in these metrics is evident from Fig. 8 where, as further scales of information are added and the training data is augmented, the generalization performance shows improvement from 1-scale to 3-scale.Overall, it is noted that the test metrics such as the MSE reduces, and the R 2 score gets closer to one, indicating an increase in the generalization accuracy as more training scales are introduced.This can be explained through the theory of bias-variance tradeoff; training on various scales of the data introduces more variance to the training, helping the ML framework to reduce the bias it develops by just training on the full-scale of the data.
was independently identified, due to non-convergence of  0  by the time-domain and feature-encoded trainings, the proposed method can accurately identify  0  .

Table 3
are calculated by taking the average of the percentage error of the identified MSK parameters from the 3-scale training with the multiple parameter (, ) initializations.It was observed that in MSK parameter identification, similar accuracy in characterization was obtained from all training scale approaches used within each case, with errors less than 1%.This indicates that the MR PI-RNN improves the generalization performance of the motion prediction, without loss in parameter identification accuracy.

Fig. 8 :
Fig. 8: The training loss and testing metrics are shown.The shaded area in the loss and average test MSE and R 2 score figures is one standard deviation from the mean (solid line).As more scales of data are introduced in the MR training, the average Test MSE and R 2 score calculated posttraining improve in each case.
1 to obtain muscle activation signals, used to calculate the MSK forward dynamics ODE residual.The same simplified rigid body model was used as in Section 4 and appropriately scaled anthropometric properties (for the geometry of the model) and physiological parameters (for muscle-tendon material models used for the muscle groups) based on the generic upper body model defined in [68,69] were used.Fig. 9 shows the measured data of the three trials, including the transient raw sEMG signals and the corresponding angular motion of the elbow flexion-extension of the subject.

Fig. 9 :
Fig. 9: The measured raw sEMG signals and the corresponding angular motion of the elbow flexion-extension of the subject are plotted.

Fig. 10 :Fig. 11 :
Fig. 10: Comparison of test predictions post-training for each MR training scale performed.The solid dash line is the mean of the predictions post-training when various initialization points are utilized to begin the MR training, with shaded region indicating one standard deviation from the mean.
RNN) for an application to MSK systems, for time-domain motion prediction and parameter identification.A GRU with a physics-informed loss function that minimized the error in the training data and the residual of the MSK forward dynamics equilibrium was used for this purpose.Wavelet based multi-resolution techniques were used to decompose the input sEMG signals and output joint motion data into coarse-scale approximations at different scales and fine-scale details at those scales.The sEMG and joint motion multi-scale components were then mapped to each other starting from a chosen coarse-scale components and then sequentially trained (via transfer learning) to higher scales, completing the training on the full-scale of the data.By initializing training on the coarse-scale of the training data, the optimization reaches a local minimum that serves as a better initialization state for the training data that includes the sequential fine-scale details.The proposed transfer-learning based sequential training scheme can be used for learning datasets that have high frequency signals as shown in the verification example with synthetic mixed frequency sEMG data.The numerical examples show an improvement in testing prediction and identifying the parameters.We observe from the loss profiles that the testing loss decreased while the training loss increased as more scales of data were brought in.It was also observed that the average test MSE and R 2 metrics showed a clear improvement in the generalization accuracy.These phenomena can be explained through the theory of bias-variance tradeoff; training on various scales of the data introduces more variance to the training, helping the ML framework to reduce the bias it develops by just training on the full-scale of the data.
, an RNN is used to relate data inputs containing discrete sEMG signals and discrete time from all the  previous history time-steps of a trial, ∪ =−    ∈ ℝ   ,  ∈ ℤ + , to discrete joint motion data outputs at the current time-step,   ∈ ℝ , approximating the MSK forward dynamics.Let the training input at the  ℎ history step be defined as   = [  ,   1 , … ,     ], where   denotes the time at the  ℎ time step, and {   } =1   denotes the sEMG signals of   muscle groups involved in the MSK joint motion at   .The motion at time step , is then predicted using the training input from all the previous  steps using the RNN. ̂() =   (  ,  −1 ,  −1 , … ,  − ,  − ; )

Table 1 :
Parameters involved in the forward dynamics setup of elbow flexion-extension motion.

Table 3 :
The average percentage error (shown as mean ± standard deviation) between predicted and true values of the parameters for 3-scale training for each case from multiple initialization points.

Table 4
, it is clear that addition of training scales leads to improved motion predictions.The multi-resolution training leads to an increase in average test R 2 score of more than 40% (bringing it closer to one), averaged over multiple initialization seeds.With the addition of more scales, Fig. 10 clearly shows the progression in improvement of the predictions as more scales are involved in the training.

Table 4 :
The test metrics such as NMSE and R 2 score averaged over multiple initialization seeds, for the various training scales involved are reported here.The % decrease in average NMSE and % increase in average R 2 w.r.t 1-scale training are shown in the 3 rd and 5 th columns respectively.