Abstract
Purpose
Increasing access to marker-less technology has enabled practitioners to obtain kinematic data more quickly. However, the validation of many of these methods is lacking. Therefore, the validity of pre-trained neural networks was explored in this study compared to reflective marker tracking from sagittal plane cycling motion.
Methods
Twenty-six cyclists were assessed during stationary cycling at self-selected cadence and moderate intensity exercise. Standard video from their sagittal plane was obtained to extract joint kinematics. Hip, knee, and ankle angles were calculated from marker digitisation and from two deep learning-based approaches (TransPose and MediaPipe).
Results
Typical errors ranged between 1 and 10° for TransPose and 3–9° for MediaPipe. Correlations between joint angles calculated from TransPose and marker digitalization were stronger (0.47–0.98) than those from MediaPipe (0.25–0.96).
Conclusion
TransPose seemed to perform better than MediaPipe but both methods presented poor performance when tracking the foot and ankle. This seems to be associated with the low frame rate and image resolution when using standard video mode.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Bicycles have been used as a form of active transportation due to the known benefits in terms of health and reduced environmental impact than motor vehicles [1]. In addition, cycling is a very popular sport with strong history of scientific engagement to improve performance and reduce the risk of injuries [2]. Among the most used methods to alter movement order to reduce risk of injuries and improve performance is video analysis [3]. Modern systems enable three-dimensional assessment of human movement but are limited to laboratorial environment or are prohibitively expensive, which limits their use in most clinics and recreational sports settings. Some systems utilise wearable sensors (e.g. LEOMO®) or a dual-camera setting which integrates data in the three-dimensional space (i.e. RETUL®). However, RETUL® does not disclose the cost of the system if you do not take part of their training module and extracts pre-determined variables that have not been based on scientific data. The LEOMO® has shown to produce moderate levels of agreement (ICC = 0.52–0.71) for some outcomes [4], which are also not supported by robust scientific evidence.
Traditionally, analysis of movement of cyclists has been undertaken on a stationary ergometer/trainer using two-dimensional (2D) video footage [5,6,7]. Markers are attached to the cyclist’s body and should be visible in the video frame to enable tracking in real time or after the video is recorded. This method enables clinicians and coaches to explore sensitivity of joint angles to changes in exercise intensity, pedalling cadence, fatigue, and body position on the bicycle [8,9,10]. However, tracking markers is time consuming and depends on skill level of the practitioner palpating the appropriate bony landmarks [11]. This element limits the large scale use of quantitative movement analysis to clinical settings.
The rapid development of trained neural networks to identify key human joint locations has provided an opportunity to streamline the analysis of videos (e.g. marker tracking). Neural network approaches to 2D human pose estimation are based around training a large model with input/output pairs, where the input is an RGB image and the output is a complete set of 2D joint locations. After training is complete, the model approximates a function for mapping images to joint locations. Even though studies have explored the validity of marker-less methods in determining joint angles [12, 13], only two studies explored the validity of pre-trained neural networks for cycling movement [14, 15]. Data from these studies suggest that a popular convolutional neural network (CNN) method for pose estimation proposed by Microsoft Research Asia [16] results in errors between 3 and 12° whilst OpenPose [17] led to errors of 4–22° in relation to a criterion measure [14, 15]. These errors would be potentially larger than the range proposed to determine body position on the bicycle [i.e. 10 deg. [18, 19]. In addition, data from Bini et al. [15] demonstrated that utilising a statistical parametric mapping method (i.e. SPM; [20]) provides a temporal comparison between the marker-less and a marked dataset to fully determine sections of the crank cycle where a given method is less accurate. This method should be implemented when assessing the validity of other marker-less methods.
With this in mind, this study examined the validity of two neural networks pre-trained to track key human joint locations in images (i.e. TransPose and MediaPipe) with potential ability to improve tracking of body segments. TransPose-R-A4 [21] was selected as a model representative of state-of-the-art accuracy in 2D human pose estimation. This model architecture incorporates a ResNet backbone [22] with a Transformer encoder [23] and requires a computer equipped with a GPU device for timely inference. MediaPipe BlazePose GHUM Heavy [24] was selected as a model representative of state-of-the-art efficiency in human pose estimation, since its optimised architecture enables inference in a range of computational environments including on smartphones and within web browsers. For validation of joint angles calculated using data from these networks, tracked reflective markers (reference) were utilised with the hypothesis that both networks would provide acceptable agreement in relation to the reference data.
Materials and methods
Twenty-six cyclists (four females and twenty-two males) with 37 ± 10 years of age, 178 ± 9 cm of stature and 80 ± 11 kg of body mass ranging from recreational to competitive were assessed in a single session using their own bicycles. Before data collection, all cyclists signed an informed consent to participate in the study, which was approved by the University Human Ethics Committee (AUTEC09/178). The sample size was calculated utilising a correlational model aiming for an effect size of ρ > 0.55 (large effect) with α < 0.05 and power of 0.80 using G*Power statistical package [25]. We based our calculations on the test–retest reliability of joint angles in cycling indicating that a coefficient of determination of 0.30 (i.e. effect size of 0.55) would be detectable when 21 samples are utilised [26]. The rationale for adding five cyclists was to ensure that any issues with processing video files would not result in less than 21 cyclists with all data available for statistical analysis.
After measurements of stature and body mass, cyclists performed 2 min of cycling on their own bicycles attached to a cycle trainer (Kingcycle, Buckinghamshire, UK) at self-selected cadence using their cycling shoes and cleats. Participants were instructed to sustain an intensity equivalent to long duration flat cycling. A digital camera (Samsung ES15, Seoul, South Korea) positioned at the height of their saddle, 4-m away from the bicycles recorded movement in the sagittal plane. Reflective markers were positioned at the greater trochanter, lateral femoral epicondyle, lateral malleolus, and pedal spindle (Fig. 1). Videos were recorded for 20 s at the end of the 2 min of exercise at 30 fps (640 × 480 of frame resolution) using automated quick shutter and anti-shake settings to minimise blur. The option for standard video rather than high speed was selected to simulate specifications of most smartphone video cameras, which are widely used in clinics and sports settings.
Comparison between the TransPose and the MediaPipe methods in relation to reference data (marker tracking) was performed. Pre-trained model weights were obtained from each method’s respective public code release and incorporated into a customised evaluation framework. In this context, the term “pre-trained model weights” refers to the fact that the neural network was previously trained on a separate dataset (as opposed to training the models on our cycling data). Since the existing TransPose model weights were not trained with detailed foot keypoints, this model was further fine-tuned using the Human Foot Keypoint Dataset [17]. The cycling video files were then imported to a customised programme which first located the cyclist using an object detection model (YOLOv5) and then inferred joint centres. An object detection model predicts bounding boxes for objects in an image (these are also referred to as “detections”). Whereas a pose estimation model maps an image to joint keypoints, an object detection model maps an image to object bounding boxes. In the context of this work, we use an object detector to locate the cyclist within the broader image. Separate fine-tuning and detection steps were not necessary for the MediaPipe model. Predicted joint centres (i.e. keypoints) were obtained from both methods and utilised to calculate hip, knee and ankle angles, as shown in Fig. 1. Keypoints were gap filled using a median filter and a moving average was utilised to reduce noise from the automated digitisation prior to angular calculations.
As a criterion measure, hip, knee and ankle angles were also calculated using reflective markers digitised from each frame. Semi-automatic digitisation was performed using a motion analysis software (Skill Spector, Video4Coach, Denmark). The median filter and moving average were also applied to the digitised joint centres to reduce filtering effects to comparisons with the marker-less methods. An offset was applied to ankle angles from the marker-less outputs because these angles were measured differently to the criterion method, where the ankle was determined using the pedal axle (see Fig. 1). Data from the two methods and the criterion were sectioned into ten consecutive crank cycles, with the mean temporal series from each cyclist obtained for further analysis.
Comparison of temporal patterns were performed between methods using statistical parametric analyses within spm1d statistical package (www.spm1d.org), in MATLAB. Paired samples t-tests were conducted to compare each marker-less method in relation to the reference data. Typical errors were calculated for the whole crank cycle for comparisons between methods as the ratio between the standard deviation of the differences by the square root of ‘2’ [27]. Correlation coefficients (with 95% confidence intervals) were also calculated between waveforms in MATLAB. R values were ranked as poor (0–0.5), moderate (0.5–0.75), good (0.75–0.90), and excellent (> 0.9) [28].
Results
Correlation coefficients for hip angles between the TransPose method and the reference were 0.97 [excellent, 0.97–0.98, p < 0.01]. For knee angles, correlation coefficients between the TransPose method and the reference were 0.98 [excellent, 0.98–0.99, p < 0.01]. For the ankle angle, correlation coefficients between the TransPose method and the reference were 0.47 [poor, 0.46–0.49, p < 0.01]. For the hip angle, significantly less flexion was observed for TransPose than the reference between 90 and 129° and more flexion was observed between 304 and 331° of the crank cycle (Fig. 2). Significantly less flexion was also observed for TransPose compared to the criterion between 50 and 110° of the crank cycle (Fig. 3). For the ankle, significantly more plantar flexion was observed for TransPose between 44 and 59° of the crank cycle (Fig. 4).
Correlation coefficients for hip angles between the MediaPipe method and the reference were 0.91 [excellent, 0.90–0.91, p < 0.01]. For knee angles, correlation coefficients between the MediaPipe method and the reference were 0.96 [excellent, 0.95–0.96, p < 0.01]. For the ankle angle, correlation coefficients between the MediaPipe method and the reference were 0.25 [poor, 0.23–0.27, p < 0.01]. For the hip angle, significantly less flexion was observed for MediaPipe than the reference between 90 and 129° and more flexion was observed between 304 and 331° of the crank cycle (Fig. 5). Significantly more flexion was observed for MediaPipe between 0 and 36° and between 175 and 360° of the crank cycle (Fig. 5). The knee was also more flexed for MediaPipe between 147 and 272° of the crank cycle (Fig. 6). For the ankle, significantly more plantar flexion was observed for MediaPipe throughout the crank cycle (Fig. 7).
Typical errors are presented in Figs. 8 (TransPose) and Figs. 9 (MediaPipe). For the hip angle differences ranged between 1 and 3° for the TransPose method in relation to the criterion, whilst the MediaPipe presented errors of 3–6°. For the knee angle, typical errors ranged between 2 and 3° for the TransPose method and 3–6° for the MediaPipe method, as illustrated in Fig. 5. For the ankle angle, typical errors between the TransPose and the criterion ranged between 3 and 10°, whilst the MediaPipe method differed between 5 and 9°.
Discussion
This study demonstrated that TransPose presented stronger agreement and lower difference to the reference method than the MediaPipe method, which partially supports our hypothesis. Major differences between both marker-less methods and the reference data were at the ankle joint. This information is important because, MediaPipe is gaining popularity as a result of its versatility, which enables deployment in smartphones and web browsers. However, the magnitude of errors from MediaPipe should be taken into consideration depending on the application.
In prior studies involving walking gait, marker-less methods presented differences between < 1° [12] and 6° [13], which is comparable to findings from the current study for the hip and knee joints. For cycling, Bini et al. observed 3–12° of difference between the MSRA and the reference data [14], which suggests that TransPose and MediaPipe may perform better than the MSRA. It is also important to highlight that these methods seem to perform well when tracking the hip and knee joints but struggled to track foot markers. This is why both TransPose and MediaPipe produced poor agreement in determining the ankle angle. Visual inspection of the videos generated by these methods suggest that this was potentially due to increased blur at the foot from lower shutter speed, which challenged the marker-less methods in accurately detecting the toes. An increased shutter speed and higher quality image sensor should improve the accuracy of these methods in future application.
There are multiple factors that influence joint angles during cycling, including exercise intensity, cadence, fatigue, etc. Prior research observed that the ankle range of motion increases by ~ 4° and mean ankle angle reduces by ~ 3° when intensity is increased during cycling [29], which suggests that none of the marker-less methods tested in the current study would be sensitive to detect these changes. Another application of joint kinematics is as an input in musculoskeletal modelling. Simulating changes in knee angle of 3–6° in terms of the moment-arm of the vastus lateralis in a public available model [30] would result in errors of 0.14–0.30 cm, which could be deemed small. Therefore, it seems possible that both marker-less methods could offer an open-source alternative to subscribed marker-less software, but further research is required to fully determine the magnitude of these errors in terms of internal loads. There are potential implications for bicycle fitting because most studies recommend a range of knee angles to optimise saddle position (e.g. 30–40 deg.; [31]), which should be detectable by TransPose and MediaPipe. In addition, knee forces do not seem to be sensitive to changes in knee angles of ~ 10–14° [32], which suggests that large changes in cycling kinematics could be detectable particularly by both the methods.
The option for using pre-trained neural networks required amendments to TransPose as this was not initially prepared to identify foot keypoints. In addition, neither of the marker-less methods have been extensively exposed to cycling images or poses taken purely from the sagittal plane [33]. It is probable that fine-tuning TransPose and MediaPipe with sagittal plane cycling images would further improve their accuracy, particularly when tracking the foot.
The use of two-dimensional video analysis limited data from this study due to possible parallax errors. The choice for using a two-dimensional model was based on a larger use of this method in most clinical settings and bike fitting practices, due to the low cost of video recording devices. Even though data from walking gait demonstrated good agreement between two-dimensional marker-less vs. three-dimensional marker [34], it is important to assume that there would be ~ 2.2–10° of error in relation to the true movement of cyclists detected using three-dimensional data [35, 36]. Our choice for using standard frame rate (i.e. 30 fps) and standard video resolution (640 × 480 pixels) was also in line with the fact that most commercial cameras will be limited in terms of frame rate. Results may improve if frame rate and image resolution are higher than the currently used in this study.
Conclusions
In summary, the TransPose method presented stronger agreement in determining joint angles compared to a criterion method than the MediaPipe method. Poor correlation though was observed for the ankle joint for both marker-less methods, which limits their accuracy in tracking this joint.
Data availability
Data will be provided upon request.
References
Sommar JN, Johansson C, Lövenheim B, Schantz P, Markstedt A, Strömgren M et al (2021) Overall health impacts of a potential increase in cycle commuting in Stockholm Sweden. Scand J Public Health. 50(5):552–564. https://doi.org/10.1177/14034948211010024
Atkinson G, Davison R, Jeukendrup A, Passfield L (2003) Science and cycling: current knowledge and future directions for research. J Sports Sci 21(9):767–787
Grassi A, Smiley SP, Di Sarsina TR, Signorelli C, Muccioli GMM, Bondi A et al (2017) Mechanisms and situations of anterior cruciate ligament injuries in professional male soccer players: a YouTube-based video analysis. Eur J Orthop Surg Traumatol 27(7):967–981
Plaza-Bravo JM, Mateo-March M, Sanchis-Sanchis R, Perez-Soriano P, Zabala M, Encarnacion-Martinez A (2022) Validity and reliability of the leomo motion-tracking device based on inertial measurement unit with an optoelectronic camera system for cycling pedaling evaluation. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph19148375
García-López J, del Blanco PA (2017) Kinematic analysis of bicycle pedalling using 2d and 3d motion capture systems. ISBS Proc Arch 35(1):125
Bini RR, Jacques TC, Lanferdini FJ, Vaz MA (2015) Comparison of kinetics, kinematics, and electromyography during single-leg assisted and unassisted cycling. J Strength Cond Res 29(6):1534–1541. https://doi.org/10.1519/jsc.0000000000000905
Fonda B, Sarabon N, Li FX (2014) Validity and reliability of different kinematics methods used for bike fitting. J Sports Sci. 32(10):940–946. https://doi.org/10.1080/02640414.2013.868919
Bini R, Priego-Quesada J (2022) Methods to determine saddle height in cycling and implications of changes in saddle height in performance and injury risk: a systematic review. J Sports Sci 40(4):386–400. https://doi.org/10.1080/02640414.2021.1994727
Bini RR, Senger D, Lanferdini FJ, Lopes AL (2012) Joint kinematics assessment during cycling incremental test to exhaustion. Isokinet Exerc Sci 20(1):99–105. https://doi.org/10.3233/IES-2012-0447
Bini RR, Rossato M, Diefenthaeler F, Carpes FP, Dos Reis DC, Moro ARP (2010) Pedaling cadence effects on joint mechanical work during cycling. Isokinet Exerc Sci 18(1):7–13. https://doi.org/10.3233/IES-2010-0361
Szczerbik E, Kalinowska M (2011) The influence of knee marker placement error on evaluation of gait kinematic parameters. Acta Bioeng Biomech 13(3):43–46
Ong A, Harris IS, Hamill J (2017) The efficacy of a video-based marker-less tracking system for gait analysis. Comput Methods Biomech Biomed Engin 20(10):1089–1095. https://doi.org/10.1080/10255842.2017.1334768
Kanko RM, Laende EK, Davis EM, Selbie WS, Deluzio KJ (2021) Concurrent assessment of gait kinematics using marker-based and markerless motion capture. J Biomech 127:110665. https://doi.org/10.1016/j.jbiomech.2021.110665
Bini R, Serrancolí G, Santiago PRP, Moura F (2021). In: Archive IP (ed) Assessment of a markless motion tracking method to determine body position on the bike. ISBS, 2021, Canberra
Bini RR, Serrancoli G, Santiago PRP, Pinto A, Moura F (2023) Criterion validity of neural networks to assess lower limb motion during cycling. J Sports Sci 41(1):36–44
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 472–487
Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2021) OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transact Pattern Anal Mach Intell. 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Millour G, Duc S, Puel F, Bertucci W (2019) Comparison of static and dynamic methods based on knee kinematics to determineoptimal saddle height in cycling. Acta Bioeng Biomech 21(4):93–99
Swart J, Holliday W (2019) Cycling biomechanics optimization-the (R) evolution of bicycle fitting. Curr Sports Med Rep 18(12):490–496. https://doi.org/10.1249/JSR.0000000000000665
Pataky TC, Robinson MA, Vanrenterghem J (2013) Vector field statistical analysis of kinematic and force trajectories. J Biomech 46(14):2394–2401. https://doi.org/10.1016/j.jbiomech.2013.07.031
Yang S, Quan Z, Nie M, Yang W. Transpose: Keypoint localization via transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision2021. p. 11802–12
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition2016. p. 770–8
Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, et al. 2018 Image transformer. International conference on machine learning: PMLR. p. 4055–64
Lugaresi C, Tang J, Nash H, McClanahan C, Uboweja E, Hays M et al (2019) Mediapipe: A framework for building perception pipelines. https://doi.org/10.48550/arXiv.1906.08172
Faul F, Erdfelder E, Lang A-G, Buchner A (2007) G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39(2):175–191. https://doi.org/10.3758/BF03193146
Burnie L, Barratt P, Davids K, Worsfold P, Wheat J (2020) Biomechanical measures of short-term maximal cycling on an ergometer: a test-retest study. Sports Biomech. https://doi.org/10.1080/14763141.2020.1773916
Hopkins WG (2000) Measures of reliability in sports medicine and science. Sports Med 30(1):1–15. https://doi.org/10.2165/00007256-200030010-00001
Dancey C, Reidy J (2004) Statistics without maths for psychology with psychology dictionary. Pearson Education, Limited, London
Bini RR, Diefenthaeler F (2010) Kinetics and kinematics analysis of incremental cycling to exhaustion. Sports Biomech 9(4):223–235. https://doi.org/10.1080/14763141.2010.540672
Catelli DS, Wesseling M, Jonkers I, Lamontagne M (2019) A musculoskeletal model customized for squatting task. Comput Methods Biomech Biomed Engin 22(1):21–24. https://doi.org/10.1080/10255842.2018.1523396
Bini R, Priego-Quesada J (2022) Methods to determine saddle height in cycling and implications of changes in saddle height in performance and injury risk: a systematic review. J Sports Sci. 40(4):386–400.https://doi.org/10.1080/02640414.2021.1994727
Bini RR, Hume PA. 2014 Effects of saddle height on knee forces of recreational cyclists with and without knee pain. International SportMed Journal. 15(2):188–99. https://www.researchgate.net/publication/263587378_EFFECTS_OF_SADDLE_HEIGHT_ON_KNEE_FORCES_OF_RECREATIONAL_CYCLISTS_WITH_AND_WITHOUT_KNEE_PAIN
Bazarevsky V, Grishchenko I, Raveendran K, Zhu T, Zhang F, Grundmann M. 2020 BlazePose: On-device real-time body pose tracking. arXiv. arXiv preprint arXiv:200610204
D’Antonio E, Taborri J, Mileti I, Rossi S, Patané F (2021) Validation of a 3D markerless system for gait analysis based on OpenPose and two RGB Webcams. IEEE Sens J 21(15):17064–17075. https://doi.org/10.1109/JSEN.2021.3081188
Fonda B, Sarabon N, Li F-X (2014) Validity and reliability of different kinematics methods used for bike fitting. J Sports Sci 32(10):940–946. https://doi.org/10.1080/02640414.2013.868919
Umberger BR, Martin PE. 2001 Testing the planar assumption during ergometer cycling. Journal of Applied Biomechanics. 17(1):55–62. http://journals.humankinetics.com/jab
Acknowledgements
The authors thank all cyclists who volunteered to be part of this study.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Contributions
Rodrigo Bini was involved in the data collection, analysis and writing of the paper. Vitor Nascimento and Aiden Nibali contributed during data analysis and with edits to the paper. All authurs reviewed and approved the paper prior to submission.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest with the content of this paper.
IRB approval
AUTEC09/178.
Ethical approval
All methods have complied with the Helsinki declaration
Human and animal rights
This study’s methods have been approved by the local ethics committee (AUTEC09/178).
Informed consent
All participants provided written consent to take part in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bini, R.R., Nascimento, V.B. & Nibali, A. Validity of neural networks in determining lower limb kinematics in stationary cycling. Sport Sci Health 20, 127–136 (2024). https://doi.org/10.1007/s11332-023-01075-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11332-023-01075-7