1 Introduction

Sit-to-stand (STS) transitions are one of the most challenging movement activities required for independent daily living that is often compromised by aging and other neurological and physical conditions such as strokes, spinal cord injuries, osteoarthritis, and muscular dystrophy. A healthy adult on average performs 60 (±22) STS transitions each day [1], and this action serves as the fundamental initial step necessary for ambulation. Furthermore, STS transition tests are also often used as functional performance measures in clinical practice and to estimate one’s muscle strength in the lower extremities [24]. An STS transition cycle from a static seated position to a stable upright stance position in healthy individuals typically takes less than 3 seconds [5] and is divided into two to five phases for the purpose of analysis depending on the specific criteria or perspective of the study [69]. STS is initiated by leaning forwards with the torso, then the momentum is transferred from the upper body to the whole body, the hips and knees extend to reach a standing position, and finally STS is complete when standing stability is achieved. The gluteus maximus (hip extensors), the quadriceps (knee extensors), and to a considerable extent the hamstrings are some of the primary muscles contributing to the STS transition [10], while the muscles surrounding the ankle joint contribute towards balance and stability. We believe that reducing the strain on the major muscle groups by assisting the knee and hip joints may contribute to effortless STS transitions.

Musculoskeletal (MSK) modeling of the STS motion allows us to analyze the parameters that are otherwise inherently difficult to measure noninvasively, such as the individual muscle forces and joint reaction forces. Caruthers et al. created a 3D custom MSK model and used it to identify and study the individual muscle contributions in accelerating the whole-body center of mass during STS [11]. Likewise, Smith et al. used an MSK modeling approach to explain the STS difficulties experienced by older adults [12]. Furthermore, several studies have also utilized predictive musculoskeletal simulations to understand the high-level physiological controllers in the human body for STS motions. Norman-Gerum et al. developed a three-link planar model and used Bézier curves to prescribe STS trajectories [13]. Kumar et al. used an open loop single shooting optimization framework to generate STS trajectories under different lower limb muscle strength deficits [14]. The cost function was designed as a linear combination of ten different cost terms that encouraged a stable STS trajectory, including minimization of control effort, penalties for breaching joint limits, penalties for any joint movement at the end time of the simulation, penalties for excessive body accelerations, and others. The use of hip vs. knee reserve actuators during severe strength deficits was explored supplementarily to identify the muscle responsible for STS failure (vasti). Munoz et al. utilized vestibular and muscle length reflexes to simulate STS within the SCONE software environment, which also utilizes a shooting-based optimization approach. The cost function utilized in this study did not include any effort measure and was set as a combination of only different degree of freedom (DOF) measures for each STS phase to emphasize the role of vestibular input in STS [15]. Gordon et al. utilized an inverse MSK optimal control (a bi-level optimization technique used to identify cost functions) framework to learn personalized STS motion strategies during perturbed STS [16] and contributed valuable insights showing that humans modulate STS strategies under instabilities in a subject-specific way.

Numerous researchers have also studied STS motions with external devices such as robotic devices that apply forces to different parts of the body [17] or wearable exoskeletons that apply joint torques with the primary goal of helping individuals regain their independence when confronted with mobility constraints. Many of these devices focus on applying volitionally-triggered desired STS trajectories through position control or impedance control, which is adequate when assisting severely impaired individuals, while others have also utilized a model-based approach. However, providing optimum assistance with optimal timing for maximal biomechanical benefit, while also ensuring stability throughout the movement, remains a challenging task. Choi et al. explored the effect of peak assistance timing in reducing knee extensor muscle activations and concluded that a peak knee assistive torque applied between 25% and 40% of the STS cycle was most effective [18]. Utilizing functional electrical stimulation on the knee extensors, combined with synchronized knee assistive torque, as exemplified in Alouane et al. [19], represents a hybrid approach also employed by various researchers. A pilot experimental STS study performed while wearing an active hip exoskeleton revealed minimal differences in the joint kinematics, while the hip assistance provided was capable of reducing muscle activations in the gluteus maximus and hamstrings [20]. Similarly, Myosuit, a hybrid cable-driven exosuit with elastic bands between limbs developed by Schmidt et al., was able to reduce the muscle activation levels in the gluteus maximus muscles up to 60% [21]. A model-based control for STS transitions that considered the exoskeleton dynamics and its contacts with the environment was developed and implemented to apply assist-as-needed torques to the lower-limb joints by Vantilt et al. [22].

Recently deep reinforcement learning (DRL) has gained popularity in MSK model [2326] and exoskeleton control [27, 28] due to advancements in computational capabilities. The redundant nature of the motor control problem makes DRL an ideal tool for devising robust controllers for exoskeletons. Utilizing such an approach, Jamali et al. used a Q-Learning method to find the optimal joint moments during STS movement for a simple torque-driven dynamic human representative linkage model [29]. Additionally, a DRL controller for a robot-assisted standing seat was also tested to optimize STS transitions, with user satisfaction serving as the reward metric [30]. In this study, we aim to obtain a biologically realistic representation of the unassisted human STS movement by employing a muscle-driven model and extending it to exoskeleton-assisted STS.

The primary objective of this paper is to propose a DRL framework for idealized STS assistances from an exoskeleton, specifically targeting the hip and knee joints due to their critical roles in facilitating STS movements. The hip is essential for initiating STS movements and pushing the body vertically upwards [11], while limitations in the force production capabilities of the knee extensor muscles are often the primary constraint in STS performance [31]. This framework is designed to deliver tailored, idealized torque profiles that synchronize seamlessly with human movements during STS maneuvers. Additionally, we explore the biomechanical changes associated with the use of these STS controllers. Four scenarios of STS are simulated and studied: 1) without exoskeleton assistance, 2) with hip assistance (H-Exo), 3) with knee assistance (K-Exo), and 4) with hip plus knee assistance (H+K-Exo). Subsequent sections of this research article discuss the methodologies employed in the simulations and the results obtained.

2 Methods

In this study, we have adapted a DRL framework to train STS controllers with and without exoskeleton assistance. These controllers are engineered to mimic a target STS motion while fulfilling additional task objectives, steered by specifically designed rewards. An overview of the DRL controller framework used in this study is shown in Fig. 1. Our DRL training environment adapts the two-level imitation learning structure initially developed by Lee et al. [32]. It consists of a trajectory mimicking control policy network that outputs desired joint angles and a muscle coordination neural network that produces individual muscle excitations to generate desired torques. A key innovation in our work is the integration of exoskeleton assistive torques into the control networks. These assistive torques work in tandem with the muscle forces to generate the desired torques. Additionally, we have implemented specially designed balance rewards to enhance the assisted STS motion’s performance.

Fig. 1
figure 1

Overview of the reinforcement learning controller framework for sit-to-stand motion control

2.1 Musculoskeletal model and idealized torque assistance

The MSK model utilized in this study was adapted from the gait10dof18.osim model, retrieved from OpenSim model repository [33]. For faster computational speed and efficient learning, the model was modified by removing the left lower extremity and its associated muscles, assuming symmetry. This adjustment reduced the model to include seven degrees of freedom (DOF): a 3-DOF planar pelvis joint, a 1-DOF lumbar joint, and 1-DOF for the hip, knee, and ankle joints each and nine muscles on the right limb. Additionally, the torso mass was halved to account for the removed lower extremity. The original Millard muscles in the model were converted to MuJoCo-type muscles [34] for computational efficiency, where the tendons are modeled as rigid components. The physical muscle parameters (such as fiber length and maximum muscle force) of each muscle were loaded from the original OpenSim model without modifications. The maximum isometric forces of each of these muscles as used in the simulations is shown in Table 1. The knee joint was simplified to a single revolute joint, and the attachment points for the vasti and rectus femoris muscles were adjusted to maintain comparable moment arms, especially when the knee is highly flexed. Further, the erector spinae longissimus and rectus abdominis muscles were added to the lumbar joint to act as simplified versions of the lumbar musculature to enable control over trunk orientation. Three contact spheres were added to the foot, one at the heel and two at the toes, to model the contact between the foot and the ground. Additionally, one contact sphere placed close to the ischial tuberosity of the pelvis and one sphere on the thigh were utilized to model the contact between the buttocks and the chair. A depiction of the final model in a standing and seated position is shown in Fig. 2.

Fig. 2
figure 2

(a) The final musculoskeletal model with 11 muscles: erector spinae longissimus (ESL), rectus abdominis (RA), gluteus maximus (GMAX), iliopsoas (IL), hamstrings (HAMS), rectus femoris (RF), vasti (VAS), biceps femoris short head (BFSH), gastrocnemius (GAS), soleus (SOL), and tibialis anterior (TA). Contact spheres are placed at the foot, thigh, and back. (b) The MSK model in a seated relaxed initial position. The displayed muscle cross-section size indicates its maximum isometric force

Table 1 The maximum isometric force for each muscle included in the musculoskeletal model (unit: Newtons)

The muscle activation (\(a\)) in all the 11 human muscles of the model is governed by the first order excitation-activation dynamics equation as follows:

$$ \dot{a} = \frac{u-a}{\tau(u,a)}, \tau(u,a) = \textstyle\begin{cases} \tau_{act}(0.5+1.5a) & u -a >0\\ \tau_{deact}/(0.5+1.5a)&u-a\leq0 \end{cases}\displaystyle , $$
(1)

where \(u\) is the muscle excitation (control signal obtained from the muscle network output) and \(\tau \) is the delay time. \(\tau _{act}\) and \(\tau _{deact}\) are muscle activation and de-activation time constants with the values set to (0.01, 0.04). This equation describing the dynamics is solved through integration with both excitation and activation values ranging within [0,1].

The dynamics of the human musculoskeletal model is represented in the joint space and is governed by the Euler–Lagrangian equations utilizing generalized coordinates:

$$ \boldsymbol{M} \left ( \boldsymbol{q} \right ) \ddot{\boldsymbol{q}} + \boldsymbol{C} \left ( \boldsymbol{q}, \dot{\boldsymbol{q}} \right ) = \boldsymbol{J}_{\boldsymbol{M}}^{\boldsymbol{T}} \boldsymbol{F}_{\boldsymbol{M}} + \boldsymbol{J}_{\boldsymbol{ext}}^{\boldsymbol{T}} \boldsymbol{F}_{\boldsymbol{ext}} + \boldsymbol{\tau}_{exo}. $$
(2)

Here, \(\boldsymbol{q}\), \(\dot{\boldsymbol{q}}\), \(\ddot{\boldsymbol{q}}\) are the joint angles, angular velocity, and angular accelerations, respectively. \(\boldsymbol{F}_{M}\) are the muscle forces that depend on the muscle activations and \(\boldsymbol{F}_{ext}\) are the external forces (e.g., contact forces) acting on the musculoskeletal model. \(\boldsymbol{M} ( \boldsymbol{q} )\) is the generalized mass matrix, and \(\boldsymbol{C} ( \boldsymbol{q}, \dot{\boldsymbol{q}} )\) accounts for the Coriolis and gravitational forces. The Jacobian matrices \(\boldsymbol{J}_{M}\) and \(\boldsymbol{J}_{ext}\) convert the muscle forces and external forces into generalized joint torques. \(\boldsymbol{\tau}_{exo}\) is the idealized exoskeleton assistance torque. In our implementation, the assistance torque at each joint is modeled as a pair of agonistic and antagonistic actuators that provide either flexion or extension assistance.

The dynamics of the musculoskeletal model are integrated using a forward dynamics approach with the muscle excitations as obtained from the muscle coordination neural network as part of the DRL framework. Kinematic constraints such as the hip and knee joint limits are imposed, and the contact forces are solved using the open-source Dynamic Animation and Robotics Toolkit (DART) simulation environment during the forward simulations [35].

2.2 Reinforcement learning for sit to stand muscle control

The MSK model interacts with the ground and the seating box in the learning environment, which is the dynamic simulator. The control of this environment is realized through a combination of two multilayer perceptron (MLP) neural networks as shown in Fig. 1: the control policy network (CPN) and the muscle coordination network (MCN). The agent (CPN) takes the human body state information as input, then outputs the desired joint angles as the action. The desired joint angles are thereafter converted into desired joint torques (\(\boldsymbol{\tau}_{d}\)) through a proportional-derivative (PD) controller [36].

The MCN neural network used for learning muscle excitations is a deterministic policy \(a = \pi _{\psi} ( \boldsymbol{\tau}_{d}, s_{muscle} )\), where the network parameters \(\psi \) are learned through regression by supervised learning. The muscle coordination network is defined with three hidden layers (n = 512, 256, 256 nodes) and the loss function is given by

$$ Loss(a(\psi ))= \mathbb{E} \left [ \left \Vert \boldsymbol{\tau}_{\boldsymbol{d}} - \boldsymbol{\tau}_{\boldsymbol{m}} - \boldsymbol{\tau}_{\boldsymbol{exo}} \right \Vert ^{2} + w_{reg} \left \Vert \boldsymbol{a} (\psi ) \right \Vert ^{2} \right ]. $$
(3)

Here, the first term minimizes the difference between the desired torques (\(\boldsymbol{\tau}_{d}\)) and the sum of the biological joint torques (\(\boldsymbol{\tau}_{m}\)) and the exoskeleton torques (\(\boldsymbol{\tau}_{exo}\)). The second term is a regularization term that reduces large muscle activations. The MCN predicted \(\boldsymbol{a} ( \psi )\) is fed to the simulation environment as the muscle excitation instead of activation since the activation must obey Eq. (1).

The CPN acts as the main RL agent controlling the MSK model’s actions based on its accumulated rewards. As the RL agent interacts with its environment, its actions are scored using a reward, and the agent is updated based on the action’s reward. At each time step \(t\), the agent’s state \(s_{t}\) is observed and an action \(a_{t}\) is selected according to its control policy \(\pi _{\theta} ( a_{t} \mid s_{t} )\), with \(\theta \) being the weights and bias of the neural network. The control policy is learned by maximizing the discounted sum of reward (\(r_{t}\)).

The DRL framework is trained with the proximal policy optimization (PPO) algorithm [37], which is a model-free policy gradient algorithm widely used for continuous control problems. PPO updates the control policy’s parameters (\(\theta \)) using the expected return’s gradient with respect to the parameters. The agent learns to increase its reward by modifying the parameters \(\theta \) of the network. The CPN is defined as an MLP with two hidden layers with 256 nodes each. A desired target trajectory is provided as reference.

The total reward function \(r_{t}\) for the RL algorithm is designed to drive the MSK model to reach the target state by including primarily a torque reward \(r_{t}^{torq}\), a tracking reward \(r_{t}^{track}\), and an extrapolated center of mass (XcoM) [38] stability reward \(r_{t}^{xcom}\). Additionally, the simulation introduces a reward for maintaining an upright posture \(r_{t}^{upright}\) and a reward for minimizing velocity at the end of the movement \(r_{t}^{vel}\). Both rewards are activated at the 2-second mark to ensure the posture is upright and the movement velocity is minimized at the conclusion of the simulation as follows:

$$ r_{t} = w^{torq} r_{t}^{torq} + w^{track} r_{t}^{track} + w^{xcom} r_{t}^{xcom} \left ( + w^{up} r_{t}^{up} + w^{vel} r_{t}^{vel} \right )_{ift \geq 2}, $$
(4)

where \(w^{torq} =0.1\), \(w^{xcom} =0.1\), \(w^{track} =1.0\), \(w^{up} =1.0\), and \(w^{vel} =0.5\). The torque reward is included to help reduce the energy consumption of the joints by minimizing the torques:

$$ \textstyle\begin{array}{l} r_{t}^{torq} = exp \left ( - \sigma _{torq} \sum _{j} \left \Vert \tau _{j} \right \Vert ^{2} \right ), \end{array} $$
(5)

where \(\sigma _{torq} =0.001\). The tracking reward minimizes the difference between the reference trajectories and the controller prescribed angles. The experimental reference STS motion capture joint angle data used in this study for trajectory mimicking is obtained from inverse kinematics solutions as determined by Caruthers et al. [11]. Our goal was not to achieve perfect tracking but to use the tracking data as general guidance and demonstrate that our RL based simulation can generate physically feasible and realistic motion even when a nonspecific motion is used for tracking. The tracking reward is defined as follows:

$$ \textstyle\begin{array}{l} r_{t}^{track} = exp \left ( - \sigma _{p} \sum _{j} \left \Vert \hat{q}_{t}^{j} - q_{t}^{j} \right \Vert ^{2} \right ), \end{array} $$
(6)

where \(\sigma _{q} =2.0\), \(q_{t}^{j}\) is the DOF value of the jth joint, and \(\hat{q}_{t}^{j}\) is the corresponding DOF value for the tracking motion. The XcoM reward is defined as

$$ r_{t}^{xcom} = exp \left ( - \sigma _{xcom} \left \Vert xco m_{t} - xco m_{t arg e t} \right \Vert ^{2} \right ), $$
(7)

where \(\sigma _{xcom} =\ 40\) and \(xcom_{target}\) is set as the \(x\) (horizontal) position of the foot’s COM. The upright posture reward is defined as follows:

$$ r_{t}^{up} = exp \left ( - \sigma _{up} \left \Vert p_{h ead}^{x} - p_{pelvis}^{x} \right \Vert ^{2} \right ), $$
(8)

where \(\sigma _{up} =\ 100\) and \(p_{pelvis}^{x} \) is the \(x\) position of the pelvis, \(p_{head}^{x} \) is the \(x\) position of the head, which equals to \(p_{pelvis}^{x} \) when the torso is totally upright. The velocity at target posture reward ensures stability when standing is achieved and is defined as follows:

$$ \textstyle\begin{array}{l} r_{t}^{vel} = exp \left ( - \sigma _{vel} \sum _{j} \left \Vert \dot{\hat{q}}_{t}^{j} - \dot{q}_{t}^{j} \right \Vert ^{2} \right ), \end{array} $$
(9)

where \(\sigma _{vel} =2\) and \(j\) is the joint index. \(\dot{\hat{q}}_{t}^{j}\) is the joint velocity value for the standing posture, which is set to zero. All the reward terms used in the optimization framework are summarized in Table 2 with their respective weights.

Table 2 Reward terms used in the DRL framework and respective weights

Moreover, early termination conditions [39] are imposed to ensure faster learning. The termination conditions include the detection of a fall (imposed by specifying a lower bound to the vertical position for the pelvis), as well as the detection of a toe or heel lift or a large enough foot sliding scenario. All training is executed on a Linux machine equipped with Intel Xeon CPUs (2.30 GHz) and a 16 GB Nvidia Quadro RTX 5000 GPU. Each training session involved a maximum of 50,000 iterations, typically requiring approximately 40 hours to reach completion. It is noteworthy that the rewards often plateaued well before reaching the 50,000 iteration mark, indicating the expected convergence during the training process.

3 Results

After obtaining trained controllers for each of the four cases, we conducted forward dynamic simulations of STS with these controllers to test their performance. Since the CPN is a stochastic control policy, we conducted 100 dynamic simulations for each case to obtain the mean responses. The initial state for these dynamic simulations in the tests were the same as the one used in the training (IK results @ t = 0 s as obtained from Caruthers et al. [11]). The variance in the tests was negligible; for example, the standard deviations in hip, knee, and joint angles for the unassisted case were all below 0.25 degrees. And we did not observe failed cases (resulting in a fall) from these dynamic simulations, underscoring the control’s robustness.

One example test case dynamic simulation for the STS motion for the unassisted condition is illustrated in Fig. 3 through a time-lapse sequence of screenshots of the model performing the motion.

Fig. 3
figure 3

An example test case of the musculoskeletal model performing the STS transition for the unassisted condition. Maximum hip and lumbar flexion occurs at t = 1.24 s. The hip velocity reaches zero at t = 2.41 s, and the simulation ends at t = 3 s. The color of the muscles indicates the muscle activation with red being maximum (1.0) and blue being minimum (0.0) (Color figure online)

The mean exoskeleton assistive hip and knee joint torques obtained for the assisted STS scenarios from the selected 100 dynamic simulations each is presented in Fig. 4. An upper limit of 50 Nm assistive torque per joint was imposed during the simulations. We chose 50 Nm for the knee and hip joints in our simulations to provide substantial but partial support, as evidenced by biomechanical studies such as Roebroeck et al. [10] and Yoshioka et al. [40], which demonstrated the typical joint torque requirements during STS movements. This level of torque strikes a balance between enabling necessary movement and encouraging user effort to prevent muscle atrophy, while also considering the state-of-the-art torque capabilities of the lightweight and compliant motors for future physical implementation. The hip and knee joint assistive torque stayed completely below this limit for the H+K-Exo scenario. Saturated maximum assistive extension torques equal to the upper limit of 50 Nm are observed during brief time periods following the occurrence of maximum hip flexion/lumbar flexion (i.e., the time point at which the hip and knee are starting to extend to go into standing position) in both the H-Exo and K-Exo assistive scenarios. All the assistive torque profiles obtained across all assistive scenarios showed a relatively smooth progression with a few small abrupt changes.

Fig. 4
figure 4

Mean assistive torques for different assistive scenarios. Note that the joint assistive torques were limited to a maximum of 50 Nm in either direction (extension or flexion) during the simulations. The standard deviations in the assistive torque profiles were negligibly small and therefore they are not displayed in the figure (Color figure online)

The average joint angles observed for each scenario are presented in Fig. 5. The angles are compared with experimental IK results obtained from Caruthers et al. [11]. A delayed maximum hip flexion timing compared to experimental data was observed across all the four scenarios. Further, the lumbar flexion peak was shifted for the H-Exo and H+K-Exo scenarios with large lumbar flexion angle as high as twice the observed experimental mean. The knee angles across all scenarios were comparable to the experimental trajectories, while the occurrence of the peak dorsi-flexion and plantar-flexion angles in the ankle were both delayed, and the overall ankle angle range was much smaller.

Fig. 5
figure 5

The mean joint angles for (a) lumbar, (b) hip, (c) knee, and (d) ankle across the 100 dynamic simulations for each STS scenario in comparison to the tracked and experimental joint angles. The experimental mean is shown by the solid black lines, while the experimental mean+/-SD region is shaded in gray. The joint trajectory that was tracked during the training is shown in dashed lines (Caruthers et al., subject ID 1106231). The first solid vertical line indicates the instant of maximum hip/lumbar flexion in the tracking data. The second vertical solid line represents the instant stable standing posture achieved in the tracking data (Color figure online)

The trajectory of the whole-body COM for all four scenarios, presented in Fig. 6(a), shows a slight backward shift in overall body position at the end of the STS for the K-Exo and H+K-Exo scenarios. The kinematic and COM trajectories indicate a slightly larger forward lean in the assisted cases compared to the unassisted baseline. The contact forces observed at the ground and the seat are shown in Fig. 6 (b,c). These forces are relatively smooth and consistent across unassisted and assisted scenarios. The contact forces obtained at the seat are comparable to simulated observations of Munoz et al., the ground reaction forces are more comparable to experimental results from them, while their simulation results follow a sharper S profile with missing clear midlevel/halfway peak.

Fig. 6
figure 6

(a) The mean center of mass trajectories during STS for all four scenarios (units are in meters). (b) The mean vertical ground reaction forces observed during STS. (c) The mean seat contact forces (Color figure online)

The mean muscle activation patterns for all 11 muscles in all four STS control scenarios are presented in Fig. 7. The peak activations in the GMAX and VAS muscle are reduced respectively by [−77% (increases), 0%], [3.6%, 12.1%], and [60.1%, 55.7%] for the hip, knee, and hip-knee scenarios. The muscle activation in the SOL muscle is near minimal, except for very small activations during the stabilizing phase of the unassisted scenario. Similarly, the activation in the BFSH muscle is also very low other than for a small portion of the unassisted scenario. Interestingly, the H-Exo assistance increases the muscle activation levels in the GMAX, GAS, and TA muscles compared to the baseline. The K-Exo assistance case also results in increased muscle activations in some muscles (HAMS & GAS).

Fig. 7
figure 7

Mean test muscle activation levels in the 11 muscles in all four STS scenarios. An activation of 1.0 corresponds to full activation (Color figure online)

The total percentage reductions in the muscle activations in the muscles compared to the unassisted condition over the whole 3-second time period are shown in Table 3. These values are obtained by comparing the average muscle activations (computed as the areas below the activation curves divided by the total time) in Fig. 7. Moreover, no considerable passive forces are noted to be developing, indicating that the muscles are not undergoing any undue passive stretching even under large knee flexion angle.

Table 3 Percentage reductions in overall (mean) muscle activations compared to the unassisted STS condition. Negative values are indicative of a percentage increase in comparison to the unassisted baseline. The soleus muscle activations were very low and therefore are not presented in this table

The progression of each individual reward component (which is inclusive of raw reward value and weight) over the simulation period is presented in Fig. 8. The tracking, upward, and velocity rewards contribute more significantly to the total reward during the upright standing stabilization phase, whereas the torque and XcoM rewards have a lesser impact but also increased their rewards right before and during standing. It is evident that the XcoM reward plays an important role during the rapid transition phase, while tracking is less prioritized relative to other portions of the movement during this segment. The variability in the rewards is minimal across conditions and stabilization in the total reward is observed at the end of the simulation period. The near-zero values for torque minimization reward right before liftoff are indicative of large control demand in the muscles and the idealized actuators. Further, the controllers’ difficulty in imitating the joint kinematics (resulting in extensive lumbar flexion) in the H-Exo and H+K-Exo conditions is visible from the lower tracking reward values observable during the rising phase. The controller struggling to achieve complete static standing stability is also evident from the changing XcoM and velocity reward values in the assisted conditions.

Fig. 8
figure 8

Reward term numerical value progression over simulation period for all STS scenarios. The terms are inclusive of the respective weights and is the mean over the 100 simulations per case (Color figure online)

4 Discussions

STS transition is undeniably an important component of daily life and results in severe loss of independence when impaired. Identifying the ideal level of joint assistance necessary to minimize effort and still preserve adequate stability in the STS motion is highly beneficial. In this study, we have engineered and presented four distinct DRL-based controllers for facilitating both unassisted and assisted (hip, knee, and combined hip-knee) STS transitions in a controlled simulation environment. Each controller underwent extensive evaluation through one hundred dynamic simulations for each scenario. Future work should involve implementing these controllers on a physical hip-knee assistive device and studying the biomechanical changes to conclusively verify their effectiveness and robustness through comprehensive hardware-equipped experimental validation.

Our predicted STS joint angles for the unassisted case generally align with those reported by Caruthers et al. [11], with two notable exceptions: hip flexion is more excessive and dorsiflexion at the ankle joint is less pronounced, though both are still within observed ranges from other studies. Specifically, while our hip flexion exceeds the levels reported by Caruthers et al., it remains within the 130 degrees maximum hip flexion angles reported by Schenkman et al. [41], with our maximum angles not surpassing 110 degrees. Our results for ankle dorsiflexion, albeit lower than Caruthers et al., are still comparable to other STS ankle trajectories, such as those reported by Kumar et al. [14]. We believe that the considerable variability in ankle angles reported across studies could likely be due to differences in experimental conditions, such as seat height or initial foot position [42]. It is plausible that the increased hip flexion observed serves as a compensatory adaptation, enhancing the alignment of the body’s center of mass over the feet when ankle dorsiflexion is relatively small.

While tracking the provided reference STS motion is one of the primary rewards, the DRL policy also accounts for other competing balance rewards, albeit at the potential expense of tracking accuracy. Given the differences between our generic MSK model and Caruthers’ model, strictly tracking the reference trajectory is unlikely to be feasible. The MSK model was not scaled anthropometrically, and the maximum isometric forces of the muscles were not updated to match the Caruthers source subject of the tracking data. Notably, the ankle joint angle of the reference trajectory is outside the mean ± SD region of Caruthers’ experimental data, but our predicted trajectory adjusted for this during the standing phase of unassisted motion. Utilizing the reference trajectory as general guidance, we showcase that our RL-based simulation can generate physically feasible motion for the current MSK model. This is achieved despite differences in anthropometry and muscle characteristics between the MSK model and the real subject, which is an important factor for tracking accuracy. It is plausible that the controller may produce very different motion or even infeasible motion if the muscle capacity is insufficient (e.g., in cases of severe muscle weakness or disability).

The observed muscle activations during unassisted STS in the soleus (SOL) muscle are notably minimal and corroborated by Caruthers et al.’s [11] findings, which also revealed low levels of SOL activations in their static optimization results. Munoz et al. in their studies noted higher experimental activations in the SOL muscle, while the activations yielded by their reflexive-controller based simulations were low, mirroring our DRL based observations [15]. Similarly, Kumar et al. also reported very minimal SOL activations in their simulations [14]. Exploring the activation behavior of the TA muscle reveals that our TA activation pattern and level closely align with existing literature [11] despite the slightly reduced dorsiflexion angle. In the unassisted scenario, short-duration saturation in activations were observed for the IL and VAS muscles. The VAS is noted to reach high activation levels during the peak of the extension phase [14, 15], whereas no corresponding experimental STS data was found for the IL activation. The RA and ESL muscles exhibit prolonged full activation in our simulation, likely due to their oversimplified modeling which was only intended to produce the necessary lumbar moments. Practically, activation levels for the RA and ESL should be lower with mean activations typically up to 55% and 70%, respectively, during fast STS maneuvers [43]. This suggests that the current RA and ESL muscles in our model might be too weak, and a higher isometric force might be more appropriate. It is plausible that the high activation of ESL could also be a result of the predicted motion with fast movement of COM during the extension phase. With hip assistance, the ESL activation is increased compared to the unassisted case. One possible explanation for this could be that, while hip assistance reduced hip muscle torques, the ESL may need to increase its activation to control the movement of the pelvis and torso.

In general, all the assistive cases resulted in a delayed occurrence of the maximum lumbar and hip flexion angle, as well as a deeper flexion before seat-off. Hip assistive torques both in the H-Exo and H+K-Exo scenarios increase the maximum lumbar flexion angle by around 50% compared to baseline and resulted in slightly lower peak hip flexion angles. Many studies have identified increased lumbar flexion as a stabilization strategy in STS [14], hinting that the larger lumbar motion we observe could be a compensatory mechanism to maintain balance when hip assistive torque is provided. The muscle activations in the two lumbar muscles, the ESL and RA, were observed to be saturating, indicative of potentially insufficient muscle strength that resulted in large lumbar flexion. Similarly, the hip assistive torques in these two assistive scenarios also resulted in a higher ankle plantar flexion angle when rising from the chair to a standing pose compared to the unassisted STS ankle angle. The trajectory through which the knee angle extended remained relatively consistent across the unassisted and all the assistive STS scenarios.

The H+K-Exo assistive case is effective in reducing the activations in all the muscles in comparison to baseline, except for the TA, RA, and ESL. Particularly, considerable activation percentage reductions ([VAS, 73.23%], [GMAX, 68.63%], [HAMS, 58.21%], and [RF, 92.32%]) in the major STS hip and knee extension contributors are noticed. However, the H-Exo and K-Exo scenarios result in increased muscle activations in the HAMS, GAS, and TA muscles. Assisting the knee joint alone (K-Exo) seems to introduce some instability after seat-off that is countered by additional muscle activations in the GAS and TA muscles (major ankle plantarflexion and dorsiflexion muscles). Similar observations where muscles surrounding the unassisted joints increasing in activation level for higher assistance levels have been noticed in emulator-based ankle assisted experimental gait studies [44]. The biarticular hamstrings and the rectus femoris are known to exhibit significant co-contraction during STS [45]. This is also observable to some extent in our biarticular HAMS and RF in the model right after seat-off for the unassisted and K-Exo scenarios.

The torque profiles acquired from our simulations demonstrated peak assistive torque values approximately at the midpoint of the STS motion and displayed a generally smooth profile with some oscillations and few abrupt direction changes. These profiles can be readily parameterized after minor adjustments and implemented on a real-world hip-knee exoskeleton through sim-to-real transfer as required. Similarly, parameterized smooth torque profiles such as cubic splines or linear portions are often used as the basis for specifying assistive torques in exoskeletons during gait [46] or other activities such as squats [47]. This is particularly common in exoskeleton prototype emulator-based human-in-the-loop studies, where such torque profiles are optimized by altering these parameters based on human physiological objectives [48] or user preferences [49]. Our framework generates generic torque profiles that could potentially be used as initial references for subject-specific assistance optimization in human-in-the-loop experiments, pending validation on real-world hardware. By serving as a starting point, these profiles enable the customization of assistive strategies to meet individualized biomechanical needs of users during STS transitions. This structured approach presents a promising avenue for developing more personalized and effective exoskeleton assistance for STS in the future.

The presented framework could potentially be adapted to develop generic STS assistive controllers for clinical populations, such as individuals with neuromuscular disorders or the elderly, by incorporating factors like muscle strength deficits, activation-deactivation delays, and neuromuscular noise. A more complex MSK model and additional refinements to the rewards will likely be necessary, along with validation to ensure effectiveness before clinical application. As an example, modeling conditions that are characterized by asymmetry, such as hemiparesis, requires a two-legged model with the degrees of freedom represented in all three dimensions. In addition, this approach can be broadened to account for different chair heights, unforeseen environmental interactions, such as slippery floors or incorporating the use of arm supports and crutches, or even unexpected perturbations through a more comprehensive training of the controller.

The presented methodology has revealed promising avenues for the development of DRL-based STS assistance controllers with several notable limitations and opportunities for future enhancement. Firstly, we modeled the STS assistance as idealized joint torques, omitting the consideration of exoskeleton inertial properties and the interactive forces between the exoskeleton and human. Nonetheless, this allows us to focus on the DRL and generating assistance profiles without tailoring them to any specific exoskeleton design. Secondly, the DRL controller was only trained on one generic symmetrical MSK model. While this approach provides initial insights, extending it to subject-specific MSK models with more realistic muscle models could potentially yield personalized assistive torque profiles. This limitation can potentially be further alleviated by incorporating domain randomization in the simulation, which would simulate a broader range of model conditions and variables, improving the realism and generalizability of the results [28]. Furthermore, our preliminary insights into the influence of hip and knee joint assistance on STS motion are encouraging, though they highlight areas where simulation fidelity could be improved and interpretation of results could be strengthened with statistical analysis. Enhancing the model by increasing maximum isometric force values in specific muscles (such as RA, ESL, VAS, and IL) could lead to more precise simulations and better reflect realistic STS motion. Moreover, adopting multiple trained controllers, each trained with different rewards and variations of the MSK model, could generate more diverse motions and responses to assistance. Reporting their mean and standard deviations will provide a more comprehensive statistical analysis of possible solutions. Lastly, the current method of selecting the relative weights for reward terms during the learning process relied on a trial-and-error approach. Adopting a more systematic approach, such as a grid search or an inverse reinforcement learning [50] approach for reward shaping, could potentially further improve the learning of the controllers. Despite these limitations and potential improvements, our approach sets a strong foundation for future research. Rigorous real-world testing and experimental validation will be crucial in confirming the practical effectiveness and feasibility of these DRL-based controllers in STS assistance.

5 Conclusions

This study presents a DRL framework for training robust muscle controllers and generating muscle-controlled STS movements while co-optimizing joint assistive torques. The resulting controllers demonstrate the in-situ ability to substantially reduce muscle activations in major lower limb muscles without introducing significant changes to kinematics or compromising stability, especially when both the hip and knee joints are conjointly assisted. Although the practical effectiveness of the predicted assistance torque profiles in assisting STS requires real-world testing and validation, the findings of this research provide valuable insights for the development of similar DRL methodologies to develop robust controllers for assistive technologies, including exoskeletons and prosthetic devices.