1 Introduction

Fig. 1
figure 1

The OffiStretch system provides guidance for stretching exercises in form of multimodal pre-recorded instructions and real-time visual feedback on a simulated virtual mirror. With this research, we aim to enhance users’ motivation to perform physical exercises, help to ensure correct posture for an optimal outcome, and support the automatic analysis of user performance

The tendency for home office work has strongly increased due to the pandemic and will likely persist also in the future. This trend impacts peoples’ level of physical activity, as they lack movement related to their commute to work, in-person meetings, and social activities with coworkers, but also by using office equipment and furniture that is not ergonomically optimal [1]. It is commonly known that physical inactivity and a sedentary lifestyle can have negative consequences for the general population [2,3,4]. As confirmed by our survey results, people are aware of this negative impact on their health, but they do not have enough immediate motivation and personal discipline to exercise.

Motivation can be increased by integrating gamification elements into physical exercise routines, as has been recently studied in connection with video games [5] and exergames. For example, Pacheco’s review [6] compares 12 user studies with older participants, concluding that exergames can significantly improve motivation, balance, and mobility. Andrade [7] reviewed studies related to children and adolescents with obesity and reported improvements in self-esteem and self-efficacy through the use of exergames compared to control groups. Soares [8] explored the effect of exergames on the cognitive abilities of older adults compared to conventional exercise. While he found no effect on cognitive function, the use of exergames seems to positively impact motivation. This is supported by Stadiano’s study [9] on the development of motivation through exergames.

Fitness trackers represent a further important factor, as they can help to improve motivation [10] by indicating progress towards reaching one’s training targets based on measured physical activity and tracked human movements [11]. Fitness trackers usually use GPS, inertial, and physiological sensors for tracking motion and exertion, to provide users with an estimate of their total physical activity during the day. Such systems can provide very good real-time or aggregated values of various biometric properties, like heart rate, step size, or running speed [12]. However, they cannot evaluate whether the user’s run was biomechanically correct or not. The same problem appears in the case of other exercises like workouts, where information from an inertial sensor on the user’s wrist is not enough to analyze the correctness of the movement for the best outcome and to prevent injury. So, while these devices are great to bolster motivation, they are limited with regard to accuracy for full-body movement measurements.

To address this issue for stationary exercise forms, we propose a vision-based approach using off-the-shelf components for evaluating the correctness of the user’s pose based on joint angles and distances between selected body parts. Additionally, we introduce interactive visual feedback that continuously indicates the correctness of the user’s pose in a simulated “digital mirror”. The digital mirror metaphor is realized using a regular screen that shows the mirrored live capture from a camera. We apply this approach in the context of stretching exercises, where we explore its potential for coaching users to stretch correctly and increasing their motivation for daily activity.

We hypothesize that such a digital assistant can integrate well into people’s daily home office routines and motivate stretching (H1, H2, see Sect. 4), and that our proposed feedback motivates and supports users to perform their stretches correctly (H3-H6, see Sect. 5). To explore these hypotheses, we first conducted an online survey to investigate users’ needs and preferences regarding digital coaching systems for stretching. This was followed by an on-site user study, in which we evaluated users’ performance and motivation in performing stretching exercises with and without our visual feedback. Finally, we validated our methods through an expert evaluation with professional physiotherapists. Figure 1 provides a visual representation of the presented system in action.

The main contributions of this paper can be summarized as follows:

  • We present a vision-based pose analysis approach using only a single RGB camera.

  • We propose a visualization technique for live feedback to indicate pose accuracy.

  • We identify user needs and preferences for digital stretching coaches (online survey).

  • We report findings on the impact of our feedback on motivation and stretching performance (user study).

  • We validate our approach and highlight directions for future work (expert evaluation).

2 Related work

Our work builds primarily on research and developments in two major fields. Thus, we will first review body tracking technologies (Sect. 2.1), and then we describe methods for visual feedback in physical training (Sect. 2.2).

2.1 Body tracking

The basis of every interactive method for human motion analysis is a motion capture system. We can divide these systems into three basic categories: (1) user instrumentation with active sensors, (2) marker-based tracking, and (3) markerless camera tracking.

Most active sensors (e.g., wearable or hand-held) are based primarily on inertial sensors [13] that can detect changes in the users’ motions. Such sensors, as may be integrated into the smartphone or more recently a smartwatch, have the benefit of being usable in mobile scenarios, without requiring a fixed and calibrated lab installation. However, they are not capable of delivering absolute positions and therefore are subject to drift. Hybrid approaches exist, for example, the hand-held Nintendo game controller known as the Wiimote, for which tracking accuracy could be improved by complementing the inertial measurements with optical tracking through the integrated infrared camera and an extra infrared-emitting sensor bar.

Most marker-based systems usually involve optical tracking with specialized cameras, where the markers may either be passive (i.e., reflecting light) or active (i.e., emitting light). The most accurate marker systems are those used in laboratory conditions such as OptiTrack,Footnote 1 ViconFootnote 2 and QualisysFootnote 3. These systems can achieve 6DOF tracking with sub-millimeter accuracy.

The most common sensors in the markerless category are depth sensors such as the Microsoft KinectFootnote 4. These devices are generally more affordable and simple in their use than marker-based systems, while not requiring any instrumentation of the user.

A functional feedback exercise system using Kinect is the YouMove app created by Anderson and colleagues [14]. The users see themselves in a simulated mirror and they are guided by visual indicators in the image of where to move which limb. If a user reaches the target pose with sufficient accuracy, they are prompted by the system to stay in the position. These systems have the common disadvantage of requiring special hardware. In contrast to that, our approach requires only a standard RGB camera (e.g., webcam or smartphone camera). Surprisingly, even though exergames have been researched extensively, very few RGB camera-based systems can be found in the literature. Losilla and Rosique [15], Kanase et. al. [16] or Hesham et al. [17] follow a similar approach, however, do not contain visual feedback and analysis of the current user pose.

Coyler et al. review the evolution of camera-based motion analysis until 2018 [18]. Since then, body pose detection methods based on deep neural networks have been predominantly used with common examples being OpenPose[19], Alphapose [20] and Media-pipe [21]. Badiola-Bengoa and Mendez-Zorrilla discuss the use of such approaches for sports and physical exercise [22].

Fig. 2
figure 2

The schematic diagram of the OffiStretch system illustrates the user input via camera, information processing, and output displayed on the screen in the form of an augmented mirror

Fig. 3
figure 3

Screenshot of OffiStretch application with real-time dynamic feedback drawn onto trainee’s own body. The arrows encourage the trainee to extend the stance and the green circle encourages greater flexion at the knee joint. The closer the practitioner is to the desired position, the smaller the circle or thinner the line is

2.2 Visual feedback for physical training

Beyond gaming, domains such as fitness, health, and well-being have actively adopted new technologies, for example for tracking physical exercise and displaying the user’s real-time exertion and daily activity on a smartphone or watch. The availability of compact and portable displays has led to a wide variety of visualizations for training progress, from displaying step counters or traveled routes on a map, to ECG-like heart rate visualizations (e.g., on the FitbitFootnote 5), or “rings” on the Apple WatchFootnote 6. Such visualizations address peoples’ craving for a sense of progress and achievement, as well as monitoring their own health and performance. While these visualizations can reflect a user’s progress toward their set training goal, they usually provide only aggregated data and do not analyze the poses of individual body parts during the motion to assess their correctness. Failure to do so may lead to less effective workouts and can even risk adverse effects such as physical injury. This aspect may be addressed by live visual feedback of the user’s posture and motion, which has been found to positively impact mood [23] and physical well-being [24], and can guide the user to perform movements correctly as is critical for a range of sports like dancing [25, 26], TaiChi [27, 28], or Tennis [29]. Arguably, an increased number of tracking points and accuracy of pose reconstruction can support this better (e.g., approximation of motions in Ring Fit AdventureFootnote 7 vs. accurate full body tracking [28,29,30]). Related research has explored a variety of different feedback visualizations, with most common designs involving a kind of mirror image [23, 24, 31, 32] a third person perspective of oneself [25,26,27], or superimposed feedback on the body seen from first person perspective [27, 30].

Similar research to ours is the work of Elsayed et al. [17], who describe the current trends in motion capture systems and their use for home exercise. They compare three different forms of feedback for matching the user’s pose with a static posture: a silhouette, a skeleton, and a predefined avatar. An evaluation of this system revealed poor visibility of participants’ own bodies through the displayed skeleton, a lack of feedback about which body part was not oriented or positioned correctly, and a lack of audio feedback. Second work named Pose Tutor by Dittakavi et al. [33] can detect and compare trainee position with predefined position based on the k-nearest neighbors algorithm. This system is a position comparator rather than a complex exercising application. Another similar approach is 3D camera-based system called AIFit, presented by Fieraru et al. [34]. However, this system uses multiple cameras and the feedback cannot be overlaid directly onto the image.

The main difference of our presented method to state of the art is precise camera-based assessment of a body pose in real time during exercise and its presentation via live feedback mechanism to enhance stretching exercises at home or in the workplace. Unlike conventional fitness trackers and exergames that focus on general activity tracking or gamification without precise feedback on exercise form, OffiStretch employs a vision-based method to assess and correct the user’s body pose against a pre-trained target. Moreover, OffiStretch utilizes only a standard RGB camera as its motion capture system, ensuring accessibility and ease of use.

3 The OffiStretch system

In this section, we describe our methods for pose analysis and visualization. Additionally, we provide details about the design and development of our application. The name OffiStretch hereby reflects our motivation to encourage and provide interactive guidance during stretching in the (home) office. By comparing the captured stretching pose (from the video stream) to the pre-defined target pose (static position), we assess the correctness of the user’s stretching performance. The result is visualized to the user as a live video stream with visual feedback on an augmented digital mirror. The schematic diagram of the proposed system can be seen in Fig. 2, and the user interface can be seen in Fig. 3.

3.1 Body tracking and pose assessment features

Our application uses the OpenPose [19] system to detect the human skeleton. This approach utilizes image recognition using a deep neural network. To reconstruct the user’s pose, the system attempts to match patterns for 25 individual human body parts (keypoints) in the input image. For each of these keypoints, shown in Fig. 4, the system builds probabilistic heatmaps based on the typical human motion range and then reconstructs the entire human skeleton from these relative keypoint positions. The OpenPose system thereby achieves very high estimation accuracy, with errors in measured angles reported between 0.19\(^{\circ }\) (pelvis joint) and 3.17\(^{\circ }\) (right shoulder) [35].

Fig. 4
figure 4

A model showing 17 key points that we use to calculate features characterizing human posture. We use the same keypoint indexing as the original 25-keypoint OpenPose model [19] from which our model is derived

The advantage of the system is its resistance to light conditions or video quality and requires only minimal setup [19]. The system can be used almost anywhere with any camera. The single condition for successful pose detection is that no other person or image of a person (photograph, poster, drawing) is simultaneously in view.

Due to the use of a single static camera, the user’s body pose is captured in 2D space. Thus, the trainee must perform the exercises allowing the image sensor a clear (frontal or profile) perspective of the body. We achieve the correct orientation of the trainee to the camera by showing the trainer’s video as a guide. The video of the trainer is presented next to the simulated mirror where users can see themselves. Hence, trying to mimic the trainer’s posture in the mirror leads users to orient themselves correctly. This method of pose matching is already well-known from previous work [25,26,27, 30].

Finally, pose matching is performed as a real-time comparison of the defined target pose pre-recorded by the trainer (reference pose) with the tracked pose of the trainee. The static target pose is described by a number of parameters consisting of the following three measurements: the angle between three keypoints (joint angles), the screen-space orientation of the vector between two keypoints (keypoint-pair orientation), and the relative distances between keypoint-pairs. Details on their computation are provided below and the visual description of these three types of features can be seen in Fig. 5.

3.1.1 Joint angles

The angle between two vectors, constructed by connecting three keypoints A, B, and C, serves as a basic parameter to describe their mutual constellation. This can be used to reflect tracked joint angles, as seen from the camera perspective. For example, the degree of flexion of the elbow is measured as the angle between the upper arm and forearm, which is described by the keypoints shoulder (A), elbow (B), and wrist (C). This angle is computed in the 2D Cartesian coordinate system by using the dot product as follows.

$$\begin{aligned} \sphericalangle {ABC} = \arccos {\frac{ \overrightarrow{BA} \cdot \overrightarrow{BC}}{ |{BA}| |{BC}| }} \end{aligned}$$
(1)

Our system focuses on the absolute difference from the correct angle, utilizing the arccos function’s 0\(^{\circ }\) to 180\(^{\circ }\) range, to ensure feedback relevance in a 2D plane. As the arccos function is insufficient to distinguish between clockwise and counterclockwise rotations, additional pose features are used in each exercise to ensure the correctness of the body pose.

3.1.2 Keypoint-pair orientation

In everyday life, we commonly refer to the horizontal or vertical axis to describe the correct orientation of a body part, which we can formalize based on the relative orientation of keypoint-pairs. For example, the T-pose is commonly understood as a vertical alignment of the spine (e.g., the vector from neck to pelvis: keypoints 1 and 8 in Fig 4), straight vertically aligned legs (i.e., vectors between hips and feet: v(9, 11), v(12, 14)), as well as horizontal alignment of both arms (i.e., shoulder to wrist: v(2, 4), v(5, 7)). Assuming perfect horizontal alignment of the camera, the Equation 2 defines the 2D direction of the vector for the keypoint-pair (A, B) in relation to the horizontal (x) axis.

$$\begin{aligned} \measuredangle {\overrightarrow{AB}} = \arctan {\frac{ \overrightarrow{AB}_y}{ \overrightarrow{AB}_x}} \end{aligned}$$
(2)

3.1.3 Relative distances

Apart from angles and orientations, distances also play an important role in describing body poses, e.g., placing one’s feet hip distance apart. To normalize measured distances between keypoints by a user-specific proportion, we calculate relative distances with respect to the user’s spine length: The following formula describes the distance between two keypoints A and B divided by the distance between keypoints \(K_1\) and \(K_8\) (i.e., the keypoints 1 and 8 in Fig. 4), measured in pixels. Due to this normalization, we do not need to consider the user’s height or distance from the camera when calculating similarity to the reference pose.

$$\begin{aligned} |AB|_r = \frac{ |AB|}{|K_{1}K_{8}|} \end{aligned}$$
(3)
Fig. 5
figure 5

Three types of features used in our pose assessment

3.1.4 Static pose description features

We can describe each human pose by calculating a number of parameters from the 17 keypoints, based on the measurements described above. Through various keypoint combinations, based on experts’ discussion, we defined 109 pose features to represent any body pose: 60 relative distances between keypoint-pairs, 25 joint angles (between three keypoints), and 24 keypoint-pair alignments. The list of all defined features is available in the supplementary material. Each pose for a given exercise can be stored as a feature vector F:

$$\begin{aligned} F=(a_1, ..., a_M, b_1, ..., b_N, l_1, ..., l_K) \in \mathbb {R}^{M+N+K} \end{aligned}$$
(4)

where:

  • M: is the number of joint angles (25)

  • N: is the number of keypoint-pair orientations (24)

  • K: is the number of relative distances (60)

However, only a subset of these features is used to asses body pose correctness for each exercise. Table 3 defines individual selections of features for exercises used in our study. For example for exercise Arm Prayer Stretch (APS) M = 2, N = 1, K = 1. These subsets were defined in consultation with physiotherapists, based on the most relevant and prominent body part configurations required for each exercise.

3.2 Exercise instruction authoring

The authoring of instructions for a new exercise is achieved simply by including a new video recording of a trainer performing the exercise. Importantly, when recording, attention must be paid to the correct orientation of the trainer in relation to the camera position to ensure good visibility of relevant keypoints for accurate body tracking. The target pose features for the given exercise are then computed from a single manually selected frame in the video, where the trainer is in the static target pose, performing the full stretch. In the system, each exercise is then stored as a video and configuration file. The latter contains details such as the video name, frame, and all 109 descriptive parameters for the target pose. As mentioned before, only a few of these features describe the exercise, while others may not be accurately detectable due to the user’s orientation, or can be considered irrelevant for the particular exercise (e.g., elbow angles may be irrelevant for the calf stretch, but critical for the lower arm stretch). This set of most relevant features is manually selected (ideally by professional physiotherapists) and recorded in the configuration file. Pose features vary across exercises, but typically each exercise is described by three to five pose features. These selected parameters are then used to evaluate the error between the trainee’s pose and the target pose, which results in the real-time pose assessment that can be visualized using visual feedback explained below.

3.3 OffiStretch visual feedback

The visual user interface is intended for presentation on a PC monitor or TV screen. The GUI of our application contains two main windows (Fig. 3): The left window shows the video clip of the trainer with a superimposed countdown and other information about the exercise. On the right side, the users can see themselves in a webcam-simulated digital mirror. To ensure a correct perspective, the camera must be mounted on the respective display.

Each exercise begins with a brief prerecorded verbal explanation of the exercise and a loop of the instruction video showing the trainer performing the stretch. Then, the user is informed that it is their turn to start the exercise (through voice recording and text as shown in Fig. 3). In this phase, the left window shows a still frame of the trainer in the target pose and a countdown indicating the duration for which the stretching pose should be maintained. Meanwhile, the webcam-simulated digital mirror is augmented with feedback elements, to guide the user to improve her/his pose in real time. Further, every 5 s the user receives audio feedback in the form of a voice recording commenting on whether the body pose is correct (within the defined tolerance levels), or needs further adjustment. After the timer has run out, the system starts a new exercise.

The presented elements of visual feedback depend on the chosen set of pose features for which errors are computed in each exercise. We use the following two types of visual feedback to display these errors, as illustrated in Fig. 3:

Circles. Any errors in angle (i.e., joint angles and keypoint-pair alignment) are indicated by a circle that is centered on the second keypoint. The size of this circle reflects the magnitude of the difference between the trainee’s pose and the target’s pose. As the trainee adjusts their pose, the circle gets smaller or larger, conveying whether or not the actual pose is getting closer to the intended pose. The circle disappears when the joint angle or keypoint-pair alignment is correct (within the defined tolerance threshold which was experimentally set to 3 degrees).

Lines with arrows. Error in the relative distance between two keypoints is visualized by a line drawn between them. The magnitude of the error is reflected by line thickness: a thicker line indicates a greater mismatch. Arrow tips at the end of the line indicate in which direction the key points should move to correct the pose. Further, if the distance is smaller than desired, the line is colored green, and red if it is too big. As the trainee adjusts the pose, the lines are updated in real-time, reflecting the progress toward correct stretch execution. As with the circles described above, the lines also vanish once the correct target distance (within the defined tolerance threshold, which was experimentally set to 0.2) is achieved.

3.4 Hardware and software requirements

The core component of our system is OpenPose [19], with which real-time processing is possible, albeit computationally demanding. Using a laptop with Nvidia GTX 1070 GPU we achieved 16 fps. Application of our approach for a more dynamic exercise or running the system on a low-performance device can reasonably be assumed possible, but it would require optimization of the way we compute the keypoints. Possible options include cloud processing of data or using one of the frameworks designed for lower-performance devices such as Google Tensorflow LiteFootnote 8.

Table 1 Questions of our online survey, used to evaluate our two hypotheses. Each question was answered twice (once for home office condition and once for dedicated workplace condition). Answers were listed in the opposite order in the questionnaire and we inverted them for consistency of visualization within the publication

4 Online survey: stretching in the (home) office

During the design and development of our system, we conducted an online pre-study to investigate the stretching habits of people and their willingness to use an interactive system for stretching guidance. We asked participants to consider two particular conditions: working (1) in their home office and (2) at their dedicated workplace. The study was designed as an online survey with quantitative and qualitative items. We aimed to study the following two hypotheses:

H 1

People do more stretching exercises during the day when working in the home office compared to their dedicated workplace.

H 2

People would prefer to try using interactive stretching guidance in their home office compared to their dedicated workplace, and could also imagine doing so more frequently at home.

The questionnaire was answered twice by all participants (within-groups design), with fixed order of scenarios: First, the questions were asked about the home office and then about the dedicated workplace. Our H1 was addressed by question Q1, while Q2 and Q3 allowed us to explore H2 (see Table 1). Further, demographic information was collected and open questions were asked to investigate exercising habits and awareness of the negative effects of a sedentary lifestyle on participants’ health and well-being. It should be noted that survey participants were asked to imagine a system that interactively provides stretching guidance on a display, but we did not specify exactly how this system should work or what it would look like. Hence, the details about the systems they envisioned may differ, e.g., based on their prior experiences with smart mirrors or body tracking games. However, as we merely aimed to assess participants’ general willingness to use a guidance system based on display technology, we deem these potential differences irrelevant.

4.1 Online survey participants

We collected 90 survey responses from 55 men and 35 women. The age distribution of participants in predefined age groups was the following: 9 people in the group between 18–25 years, 28 people in groups 26–33, 20 in 34–41, 13 in 42–49, 7 in 50–57, 9 in 58–65, and 4 participants in a group over 65 years.

More than 90% of the participants indicated a job in academia with low physical demand and many sitting hours. With regards to nationality, 39 participants came from Czechia, 14 from Slovakia, 12 from Austria, 6 from Denmark, and 22 from other countries. Participants who could not respond to questions in both conditions (16/90), because they had no experience of working both in home office and their dedicated workplace, were excluded from the following quantitative analysis.

Fig. 6
figure 6

Responses to Q1 reflect how often participants perform stretching exercises, Q2 indicates the preferred frequency of stretching with an imaginary coaching system and Q3 reveals participants’ willingness to use a coaching system that reminds and instructs them to do stretching. The home-office scenario is presented in blue (left side) and dedicated workplace in red (right side). For more details see Table 1

4.2 Online survey results

4.2.1 Stretching activity and coaching preferences

Statistical analysis by Wilcoxon signed-rank test was performed on the quantitative responses to Q1, Q2, and Q3 (see Table 1) to explore our hypotheses (H1, H2). For all three questions participants’ responses, visualized in Fig. 6, differed significantly between conditions: participants indicated that they performed stretching exercises significantly more often in the home office (median = 5: “multiple times per week”), compared to their dedicated workplace (median = 4: “at least once weekly”) (H1). Further, they could also imagine using a digital coaching system more frequently at home (median = 6: “once per day”) compared to the workplace (median = 5: “multiple times per week”), and they responded with higher willingness to try such a system that reminds and instructs them to stretch in the home-office scenario (median = 6) compared to the dedicated workplace (median = 5) (H2). While this supports both our hypotheses, it should be noted that responses were very positive for both scenarios, generally indicating healthy stretching habits and high acceptance of using a digital coach. Detailed results are provided in Table 2.

Table 2 The results of significance assessment by Wilcoxon signed-rank test. The significance of differences between home office and dedicated workplace conditions was assessed for each question from Table 1

4.2.2 Reported health issues, risks awareness and exercising habits

In response to open questions, participants reported about existing health issues, their knowledge of the potential effects of a sedentary lifestyle, and provided details on their exercising habits while at the workplace or home office. We coded and analyzed this data in MaxQDA software. The codes were grouped into 3–7 themes per question [36].

Of the total 90 participants, 19 reported preexisting diagnosed health conditions. The most common were pain or mobility issues in the back (9), shoulders (4), and knees (3). When asked whether they were aware of any possible physiological problems caused by a sedentary lifestyle, 61/90 participants gave a positive answer. As examples they listed back pain (24), neck pain (11), wrist issues (11), pain in other joints (6), headache (5), and in lower numbers also heart and blood circulation problems, mental health issues, etc.

In the questions asking about the participants’ exercising habits, sources of exercising tips, and obstacles preventing them from exercising, the answers varied depending on the scenario (home office, dedicated workplace). The findings from coding the open questions explain the results from Q1-Q3: People prefer to exercise outside of a dedicated workplace because they do not feel comfortable exercising in front of their coworkers, as one of the participants stated: "I would feel weird doing stretching in the office with my colleagues present." This reason for not exercising at their dedicated workplace was listed by 25/90 people - (22.5%). Other obstacles listed for both home office and workplace were related to personal discipline (laziness, lack of motivation, non-existing routine, and forgetting to stretch) with 39.6% of received answers for home office and 22.5% for the dedicated workplace. Workload or tight schedules were also mentioned for both scenarios (25.2% at home, 29.7% at work). Unsuitable space was predictably more often mentioned for the dedicated workplace (14.4%) than at home (3.6%).

From these answers, we conclude a high willingness for stretching with a digital coach. We expect that OffiStretch could help people to exercise especially in their home office setting, where several limitations (coworkers, space) are absent and the coaching system could help with motivational aspects (personal discipline).

5 User study

Upon completion of our OffiStretch prototype, we performed a lab study to evaluate the overall functionality, motivation impact, and potential of our proposed digital coach. In particular, we aimed at exploring the effect of our live visual feedback on users’ motivation and performance in stretching.

5.1 Study design

To evaluate our methods for motion assessment and visual feedback we compared two conditions in within-group design (counterbalanced order):

  • NonVF - video guidance and webcam-simulated mirror without augmentation,

  • VF - video guidance and webcam-simulated mirror augmented by real-time visual feedback about pose correctness.

Both conditions involved the same video recordings showing a trainer performing each stretching exercise, as well as a verbal description (audio recording) of the stretch at the beginning of each. In VF users additionally received real-time audiovisual feedback about the correctness of their actual pose.

We investigated the following hypotheses in the study:

H 3

Stretching is performed more correctly with visual feedback (VF) than with videos only (NonVF).

H 4

Live visual feedback about stretching performance induces greater motivation to stretch (and perform stretches regularly) (VF) compared to NonVF.

H 5

Users prefer stretching with our proposed visual feedback (VF) more than with video guidance only (NonVF).

H 6

Our proposed visual feedback for stretching is perceived as effective in terms of (a) understanding/clarity, (b) helpfulness of guidance, and (c) subjective performance.

Data for assessment of performance (i.e., correct stretching, H3) was acquired by direct error measurement in comparison to the reference pose and enriched by qualitative analysis by physiotherapists. The other hypotheses were explored through questionnaires.

5.2 Study procedure

All participants completed a set of six exercises twice, once in the VF condition and once in the NonVF condition. To avoid the effects of order, conditions were counterbalanced resulting in two groups of participants. Upon arrival, all participants were informed about the procedure and data collection, signed their informed consent, and completed an initial questionnaire with personal background information. The first group started with the VF condition and the second with NonVF. After performing a set of 6 exercises with a given condition, they completed a questionnaire reflecting on the activity just performed. The first group continued with the NonVF condition and the second group with VF condition. Afterward, the participants again completed a questionnaire reflecting on the exercise set they had just completed. At the end of the experiment, they completed a questionnaire asking about differences between the exercise sets with different conditions.

5.3 Selected exercises

The six exercises were selected to cover full-body stretching. During the selection of exercises, we also paid an attention to easy detectability with our single-camera body tracking approach. The following exercises were selected for our user study (Fig. 7):

  1. (1)

    (APS) Arm Prayer Stretch

  2. (2)

    (BER) Bent Elbow Right Side

  3. (3)

    (CSR) Calf Stretch Right

  4. (4)

    (LDM) Latissimus Dorsi Muscle Stretch

  5. (5)

    (SHA) Standing Hamstring

  6. (6)

    (SHS) Standing Hamstring Stretch Right

Based on pilot testing, we empirically selected a small number of suitable keypoints as features for each exercise. These are listed in Table 3.

Table 3 For each exercise, a unique combination of features and feedback elements was experimentally selected. The numerical values correspond to the keypoints in Fig. 4
Fig. 7
figure 7

Reference position for comparison with the trainee. A set of six exercises (performed by each participant twice; once with feedback and once without feedback)

5.4 Participants

The user study was conducted with 14 participants (9 women and 5 men). The distribution in predefined age brackets was as follows: 5 people were between 18–25 years of age, 5 people were 26–33, 2 responded with 34–41, and 2 with 42–49. More than 90% of the study participants were from an academic environment, where physically demanding work is not prevalent. All participants agreed to be video-recorded for signal processing. The questionnaire responses were provided anonymously.

5.5 Signal processing

From the video recordings of users’ exercises, we exported the time series of all feedback element values for both executions (VF and NonVF). These feedback element values corresponded to the differences of each pose to the reference pose for a given exercise. In the next step, we calculated the mean values of these differences across the evaluated time interval. The mean differences were then aggregated across the pose features using weighted average to obtain the final pose correctness metric for each exercise. In a post-hoc step during data analysis, the weights for each individual feedback element were defined by three professional physiotherapists. In summary, the following steps were taken to quantify the correctness of the motion performance with respect to the reference poses:

  1. (1)

    The same time interval was used for all participants and all exercises, which was set by the countdown timer in the application. The participants practiced each exercise for exactly 30 s. The 15 s time interval between 0:10 to 0:25 was used for the matrics calculations to compare each exercise. The start was at 0:10 because we already assumed the desired position was reached. The end of the interval was at 0:25 to not consider the movements at the end of the exercise.

  2. (2)

    An average value was determined from each time series, at a selected 15 s interval for each pose feature. Thus, if the exercise was defined by 4 features, we obtained 4 average values for the exercise.

  3. (3)

    Physiotherapists determined the importance of each feature for correct use and thus determined the weight of the feature. For example, keeping the spine perpendicular to the ground was more important than keeping the feet together.

  4. (4)

    The exercise performance was determined as a weighted average of all feature distances. The performance metric was compared between two conditions for each exercise (Fig. 8).

5.5.1 Pose assessment by experts

After the study, all videos were presented to professional physiotherapists. The physiotherapists performed two tasks:

a) Determine the weight of each feedback element (pose feature) for each exercise in terms of the correctness of the exercise execution. The individual weights can be seen in Table 5.

b) Make an overall assessment of whether participants performed the exercise better with or without feedback.

5.6 Results

In this section, we first describe the results of pose matching between the reference motion and the trainee’s motion during the exercise using measured data from our system (Sect. 5.6.1). Second, we present our findings from the qualitative evaluation by physiotherapists (Sect. 5.6.2). This evaluation was done through a manual visual analysis of all recorded videos. Finally, we provide the results from our questionnaire, which investigated participants’ opinions regarding motivation, feedback clarity, correction ability, helpfulness of the coaching system, and user preference (Sect. 5.6.3).

5.6.1 Pose matching performance metrics

In order to evaluate how well the exercise was performed, we recorded all the movements during the exercise. For comparison, we used the stretching performance metric described in Sect. 5.5 using the selected set of pose features for each exercise.

The statistical results of differences between reference pose and trainees’ poses can be seen in Fig. 8. This figure compares errors of poses between conditions with and without visual feedback. The overall impression of the performance of all 14 participants in the study was aggregated for each exercise. Despite the fact that we can see trends in the boxplots where the execution with feedback seems to show less error, we did not find a statistically significant difference in the execution of the exercises without feedback and with feedback (Table 4).

Fig. 8
figure 8

Values of metrics determining error of trainees’ poses with respect to reference poses. Metrics were weighted based on qualitative assessment of professional physiotherapists. Condition with visual feedback is displayed in blue and condition without feedback is shown in red

Table 4 Statistical significance of differences between conditions with and without visual feedback for each exercise. The results were calculated using Wilcoxon signed rank test

5.6.2 Qualitative comparison by professionals

A qualitative assessment was carried out using visual analysis. Three professional physiotherapists watched all videos taken during the study. They evaluated each exercise separately. They watched all videos where subjects performed the exercise with feedback, then watched the videos without feedback. Then, the professionals summarised the common features they found in the exercises with and without feedback. For each exercise, they described how feedback influenced the differences in performance. Based on the observations, the physiotherapists also commented on the appropriateness of the chosen feedback elements. The summary of this assessment is provided as follows:

  • (APS) Arm Prayer Stretch

    There are no significant differences seen in the user performance between VF and NonVF. In both cases, the participants performed the exercises equally well.

  • (BER) Bent Elbow Right Side

    It is evident that in this exercise people perform the exercise better with feedback than when just watching the video. However, despite the fact that they perform it better, in some cases, they do not perform it quite as well as the trainer.

  • (CSR) Calf Stretch Right

    For this exercise, the physiotherapists saw slightly better execution with feedback, but observation shows that the correctness of execution varies based on the physical proportions of each subject.

  • (LDM) Latissimus Dorsi Muscle Stretch

    For this exercise, the physiotherapists did not see any noticeable differences between the performances. they attributed this mainly to poorly chosen feedback elements.

  • (SHA) Standing Hamstring

    For this exercise, the professionals did not see any major differences in the performance of VF and NonVF. In the case of VF, some users are guided to keep both legs in a vertical position, which is desirable for exercise. Without feedback, these legs are not in a vertical position due to the lack of VF, and buttock displacement occurs.

  • (SHS) Standing Hamstring Stretch Right

    In this exercise, the professionals observed worse performance in the VF variant. The feedback in this case forces people to get into positions they cannot hold. Here the choice of feedback elements was wrong. In this case, the elements should be chosen in a way that the front leg is extended at the knee. In the VF setting, the leg was bent and therefore the muscles that should be stretched by this exercise were not stretched.

Table 5 Based on the ex-post monitoring of the recordings with the subjects, professionals in the field of physiotherapy determined a weight for each feedback element. Thus, the table always displays the name of the exercise on a row and a list of feedback elements and their respective weights in a second column

5.6.3 Questionnaire analysis

The main goal of the questionnaires in our study was to investigate differences between conditions with and without visual feedback. We were interested to study the subjective responses of participants on the understanding of instructions, helpfulness of guidance, subjective performance, motivation, and their preference between two conditions.

The results of the questionnaire analysis can be seen in Fig. 9 and in Table 6. For the majority of measured factors, our visual feedback achieved better subjective ratings than the condition without visual feedback. This was not the case for the subjective performance where the condition without feedback was rated better. As we can see in Table 6, we did not find a statistically significant difference between the conditions for any of the measured factors.

Finally, the preference between stretching with and without visual feedback was measured by a subjective, two-alternative, forced-choice preference approach. Each participant had to select a preferred condition from two experienced conditions, VF and NonVF. Out of 14 participants, 11 stated they preferred our visual feedback A Chi-square non-parametric test suggests a significant preference for visual feedback (\(\chi ^2=4.571, \)p\( = 0.033\)).

The primary reason participants preferred OffiStretch over mere video guidance was its ability to provide immediate feedback for pose correction. As noted by one of the participants: "It helped me to put myself in the correct pose and to correct my posture. I find it very helpful since finding the right angle and posture is key for every stretching exercise to bring the desired benefits." The users also offered some ideas for improving the application such as adding more gamifying elements. One of the participants suggested reducing the correction requirements for triggering audio feedback: "It was stressful because many times the voice said to adjust my position. Maybe you could rethink the tolerance of the angles and decrease the number of times it corrects you."

6 Discussion

In our experiment, we used six exercises that cover stretching of different body parts. Based on visual analysis, the importance of visual feedback is not high for simpler exercises (APS). Some exercises take longer to understand (BER, SHA), some are challenging to perform and not all people can do them (SHS), so the feedback that informs trainees that they are not performing the exercise well can be frustrating. Some exercises can be performed worse with feedback than without feedback (SHS).

Fig. 9
figure 9

Data collected from on-site questionnaire. The condition with our visual feedback is indicated in blue and the condition without feedback is indicated in red. Higher value on y axis means more positive response for a given factor. A Understanding of the instructions/visualization. B Helpfulness of guidance. C Subjective performance. D Motivation. E Preference

Table 6 Results of on-site study questionnaire—a statistical significance of differences in responses after the exercises with and without feedback. Statistical significance was assessed using Wilcoxon signed-rank test

Our results indicate that feedback is in high demand by people and for some exercises we are able to design feedback that is useful according to the trainees. On the other hand, we cannot use all body pose features to simply compare each pose to its reference counterpart. Feedback needs to be looked at in a more complex fashion and each exercise needs to be considered individually (ideally with the advice of physiotherapists). Feedback must only be a supplement. The trainee needs to know that they are being monitored and that their efforts are being recorded and measured. For example, if we detect that a person has stopped exercising at all, we can give them feedback to try to continue.

Our online survey investigated the frequency of users’ stretching exercises (H1), preferred frequency of stretching with a coaching system (H2), and willingness of trying a coaching application for stretching (H2) in two conditions: (1) home office and (2) at the dedicated workplace. Our results suggest that the home office scenario is rated significantly higher than a dedicated workplace for all three factors. Therefore, both hypotheses H1 and H2 were supported by the study results.

The online survey explored the overall consciousness of participants about problems with a sedentary lifestyle, willingness to use coaching technology, as well as where and how often such technology could be used. The outcome of the survey indicates the need and user preferences towards research and development of interactive coaching applications for stretching exercises. Complementary to the online survey, our on-site study explored how our proposed system performs in comparison to a video indicating the effectiveness of our methods.

Hypothesis H3 in our on-site study, that people would perform the exercises better with feedback than without it, was not supported by the results of our quantitative error measurements. We observed that for four exercises (BER, CRS, LDM, and SHA) the error with feedback was lower than without feedback, while for two exercises the participants performed better without visual feedback (APS, SHS). These results are displayed in Fig. 8. None of these differences are significant. Moreover, the performance comparison of VF and NonVF conditions was augmented by the comments of professional physiotherapists. These professional comments reveal additional facts about the body pose assessment, for example, dependency of correctness on physical body proportions (CSR), improper selection of feedback elements (LDM and SHS), and forcing users into improper positions (SHS). These additional comments highlight the importance of the correct selection of feedback elements, individually for each exercise. The comments of professionals on each exercise are detailed in Sect. 5.6.2.

Hypothesis H4, that our system induces higher motivation to perform stretching at the moment (and also regularly) than videos was only partially supported by our results. The subjectively reported motivation was higher for the condition with visual feedback than for the condition without visual feedback. However, the difference was not statistically significant.

Hypothesis H5, that our visual feedback for stretching is more preferred by the users than video guidance, was supported according to the results of our analysis of the forced-choice preference question.

Finally, hypothesis H6 focused on the differences in subjectively reported understanding of instructions, helpfulness of guidance, and performance (Fig. 9). This hypothesis was not supported by our results because while the understanding of instructions and helpfulness of guidance were rated higher for the VF condition, the self-reported performance was higher for NonVF condition. Interestingly, this result is in contradiction with the measured quantitative pose errors for some exercises. None of the differences between conditions (in the evaluation of H6) was significant.

6.1 Limitations and future work

The main limitation of the presented methods is that a single camera and digital mirror limit the possible orientations and postures in which the users can be tracked and see themselves, therefore it can only be used for exercises that permit frontal poses [37] or side-view poses, while for others feedback in a flexible third person perspective may be of advantage [25, 28, 30].

Another limitation is the need for a really thorough selection of features for pose analysis and feedback elements. The results of our user study suggest that some of the elements that were selected and tested before the study were poorly chosen. This was only discovered after visual inspection by physiotherapy professionals. Thus, we emphasize that automated stretching coaching is an interdisciplinary problem and a mere technical solution is just a tool, but the design of similar systems cannot be done without user studies and proper insights from domain experts. In our future work, we want to design a system that facilitates mainly the involvement and evaluation of exercise selection by professionals and to conduct a larger study on more exercises and more subjects.

Our system only works with stretching exercises where the person remains in a static position. The feedback is dynamic and works with video, but the comparison is only with the static position. For the design of dynamic exercises, the system would need to be significantly modified. For some exercises, it would be enough to add more static positions (squat, push up, pull up, and similar) while for others the system would need to be completely redesigned (running, dancing, martial arts).

Another factor with a critical impact on the success of an interactive training application is the intelligibility of the visual feedback. While our indication of joint angles and distances through circles and lines was understood by all study participants, they required prior instructions. This may be improved in the future, for example by presenting the correct pose as an overlay [25, 30] on the user’s mirror image.

Based on the feedback from the questionnaires, we will also work on adding more elements of gamification and competition that were suggested by the study participants. Further, the application should include possibilities to adapt exercises for users’ individual motion range (e.g., to accommodate physical disabilities), as well as user customization of difficulty level and training goals [23].

7 Conclusion

This paper proposes novel methods for pose analysis and visual feedback for personal stretching guidance. Our methods use a single RGB camera and interactive pose estimation to detect and match the body pose of a trainee to a reference pose for stretching exercises. Finally, the detected errors are visualized for the user as an interactive overlay on a webcam-simulated mirror. This allows the user to correct their body pose and thus improve their stretching performance. We present evidence from an online survey suggesting that people prefer to perform stretching exercises more in a home office scenario than at their dedicated workplace and that there is a high willingness to use a system for interactive stretching guidance. Further, we conducted an evaluation of our OffiStretch system in a lab, investigating users’ stretching performance when using our system compared to traditional video guidance. For this, we designed six stretching exercises in collaboration with professional physiotherapists. Our study reveals the importance of tailoring feedback elements to each exercise and highlights the relevance of domain knowledge when designing a system for stretching guidance. The hypotheses, testing the effectiveness of visual feedback for stretching exercises, yielded mixed results: while users prefer live visual feedback over plain video guidance, it does not universally enhance performance. Future work will focus on refining feedback mechanisms through extensive collaboration with physiotherapy professionals, expanding the system’s capability to support a wider range of exercises and enhancing user engagement through gamification elements.