Abstract
Measuring the range of motion (ROM) is one of the important tasks in medical or healthcare sectors. However, person-to-person measurement is time-consuming and requires resources. In this paper, we propose an approach to estimate ROM using machine learning algorithm equipped with computer vision technology based on data-driven experiments. We describe the setup to gather experimental dataset to learn the angle of human joints in 2D space. From the extensive experiments and multi-linear regression learning approach, our method can reduce estimation error by 11.1% in average. We draw conclusions that machine learning-based, data-driven approaches could predict ROM better than using vision camera only.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The measurement of the range of motion (ROM) is critical in health care as well as clinic treatment professionals. Also, the importance of ROM is increased in the field of rehabilitation exercises (RE) from surgery, disability, accidents and so on. The measurement devices of ROM (MDROM) can be applied to yield ROM information only if ROM values are measured by well trained and experienced persons with relatively high precision. However, P2P measurement with MDROM has several drawbacks. First, P2P with MDROM needs resources such as education, training for professional knowledge which follows cost and certain period of time. Second, it has constraints since at least two persons must be met in same time and physical space. Third, even though a person who measures is well trained, the results are not always be consistent since it depends on the persons.
With the rapid progress of artificial intelligence (AI) in image processing recently, there is a possibility that these shortcomings can be compensated. AI bears various areas. One of the AI-based image processing areas is the detect pose landmarks in an image or video streaming. MediaPipe provides open application programming interfaces (API) [3], called ‘ML kit’ to software developers for the purpose of ease and fast building applications. Right figure of Fig. 1 is an example figure provided by ‘ML kit pose detection API’ (we will use ‘ML-kit’ for brevity hereinafter). ML-kit provides 33 coordinates of body landmarks including entire body (face, arms, legs, etc.) For each coordinate is consisted of three values representing x-, y-, and z-axis of a landmark.
Image processing powered by machine learning algorithms show the powerful ability to replace P2P MDROM. Also, researchers show practical implementations in various areas such as underwater running [4], marker less system [5], pose matching [6], measuring single-leg squat kinematics [7] and so on.
Besides medical and rehabilitation, RE machines are already on commercial market. For examples, multi-purpose device including rehabilitation, therapy and fitness is introduced in [8], mobility-abled rehabilitation exercise equipment [9], rehabilitation machine for people with disabilities [10], home healthcare device with IoT [11], and so on. However, almost RE doesn’t have functionalities for ROM using computer vision systems. We argue the main reason is that output data from computer vision is not reliable and hard to implement onto RE in terms of accuracy compared to P2P approaches.
We focus on the machine to be used in RE using machine learning to facilitate ROM with vision systems in this paper. We developed new RE machines during our research project mainly focused on upper body rehabilitation exercise/training machine. To exploit the computer vision systems, we installed camera devices and developed analysis software for various motions in rehabilitation exercise. One of the challenging problems is that since the images from rehabilitation training is 2D-coordinate, it is not trivial to estimate body angles in actual 3D-coordinates in real world. The easiest approach is to apply multiple cameras to measure ROM on RE. However, mounting several cameras is disadvantageous in terms of RE operation, maintenance, data processing of computing devices inside RE, and movability to elsewhere. We addressed the problem-solving approach utilizing machine learning to overcome insufficient 2D space information in 3D world.
2 Problem Definition
2.1 Problems of 2D Information in 3D World
To give better experience and systemic rehabilitation exercise, we developed a RE, which can monitor, and analysis 13 types of motions based on computer vision system. We report the possible rehabilitation actions of RE we developed in Table 1. The RE is consisted of exercise devices, information screen, computing devices, and a camera. The detailed figure is shown in Fig. 2.
We utilized ML-kit to detect landmarks of human body, which supports to extract 3D coordinates of 33 landmark points in real time. For more detailed information, refer to [12]. Even though the ML-kit produce 3D-coordinates with the tuple values, (x, y, z). The x and y are the landmark coordinates normalized to [0, 1], and z is the depth at the midpoint of hips.
Problems arises here is that if we consider only 2D image, the estimation of ROM has errors by nature. For example, if we measure angle of in shoulder-elbow-wrist of ROM in left elbow anterior up and down (Motion # 5 in Table 1) using only one camera as shown in Fig. 1, the angle is equal to 185°. However, if we measure the angle after moving the location of the camera to the left-side of the person the value of angle is equal to 118° which shows not acceptable errors. For the intuitive understanding, we portrayed the problematic situation in Fig. 3.
185 and 118° from the left- and right-hand figure of Fig. 3, respectively. Note that we applied well known mathematical equation to compute angle between three points in 2D space stated in Eq. (1).
where \(a, b, c\) are the 2D coordinates such that \(a=\left({a}_{x},{a}_{y}\right),\)
\(b=\left({b}_{x},{b}_{y}\right), c=({c}_{x},{c}_{y})\) and \(\left|\cdot \right|\) is absolute function, respectively.
This example is just for one situation among 13 rehabilitation motions. To mitigate errors under information loss between the 2D and 3D space, we introduce machine learning approach in this paper.
2.2 Problem Formulation
We restrict the problem such that it is just for one situation among 13 rehabilitation motions. To mitigate Given a dataset \(D\), we want to learn how to predict more precise coordinates information to estimate ROMs. As our notation, we define observed dataset \(D={\left\{\left({o}_{i}, {y}_{i}\right)\right\}}_{i=1}^{N}\), where \({o}_{i}\) is observed information and expressed by vector space, \({y}_{i}\) is label (true value), \(\left(\cdot \right)\) is tuple containing a pair of data, \(N\) is the total number of data tuples, respectively. The maximum dimension of our data is \({\mathbb{R}}^{1\times 24}\), consisting of 24 \(=8\,points\,(landmarks)\times 3(x, y, z)\). Also, we select the first step of this problem applying simple machine algorithm, multiple linear regression (MLR), which is simple but fast learning algorithm. We will use root mean squared error (RMSE) to evaluate our approach, which can be stated as Eq. (2)
where \({\widehat{y}}_{i}\) is the predicted value by MLR.
3 Experimental Setup
To gather \(D\), we conducted experiments for each rehabilitation motions as stated in Table 1. The experiments are conducted via 13 test subjects (refer Table 2). Each test subject performs 5 times for each rehabilitation motion.
To gather various possible combinations of camera position, we installed four additional cameras with different distances, heights, angle at which looking at the test subject. Finally, we have five camera configuration including rehabilitation measuring equipment (marked as ME in Fig. 4). We simulate five formations which are 9, 10, 12, 2, 3 o’clock direction from the line of sight of the test subject as portrayed in Fig. 4. Each camera record and save videos in remote storage (server) at the same time to yield the dataset \(D\) of machine learning. After finishing all experiments, we facilitated ML-kit algorithms to extract all possible coordinates among 33 landmarks. Since the camera yields 60 FPS (frame per second), we pullout 60 images per second. Finally, we secured 1,816,704 images from the experiments. Using the images, we extract all landmarks.
ML-kit extracts 1,773,234 coordinates of landmarks after dropping NONE values which failed to get coordinates by ML-kit, which is \(N=\mathrm{1,773,234}\) in \(D={\left\{\left({o}_{i}, {y}_{i}\right)\right\}}_{i=1}^{N}\). The \({x}_{i}\) is a vector with size \({\mathbb{R}}^{1\times 24}\) consist of 3D coordinates (\(x, y, z)\) from landmark #11, #12, #13, #14, #15, #16, #23, and #24 in Fig. 4. To acquire \({y}_{i}\) (label values, or equivalently ground truth values), we build mapping table for each action in Table 3.
We set \({y}_{i}\) to be \({\mathbb{R}}^{1\times 12}\) since the final task of prediction is elbow angle of right and left in this paper. To compute right and left angle, we apply Eq. (1) landmark #12, #14, #16 for right elbow and #11, #13, #15 for left elbow angle, respectively. We divided \(D\) into train and test set applying ration 80% and 20%, respectively. Finally, we trained train set by applying MLR, and predicted \({y}_{i}\) after data learning (i.e., after all train steps).
4 Result Analysis
4.1 Prediction Performance in RMSE
We predict \({y}_{i}\) of test set (landmark #11, #12, #13, #14, #15, #16, #23 values) using \({x}_{i}\) of test set via MLR trained model. Let the predicted value be \({\widehat{y}}_{i}\). Then we compute RMSE errors between \({y}_{i}\) and \({\widehat{y}}_{i}\) using Eq. (2). The overall error is 0.07064 and the least error is 0.02726 shown in right wrist \(x\)-coordinate and the largest error is 0.11332 found in left elbow \(y\)-coordinate, respectively. The RMSE comparison is shown in Table 4 and visual representation is portrayed in Fig. 5.
4.2 Prediction Errors Between Left- and Right-Hand
To compare the prediction behavior between right and left, we report scatter plots of shoulder and elbow in Fig. 6. We omit other plots due to the page limit.
4.3 Performance of ROM Measurement
Since elbow angles is only considered among ROM in this study, we averaged all RMSEs using observed coordinates and predicted coordinates via MLR, respectively. Note that the ground truth is \({y}_{angle\_from\_groundTruth}\) using \(y\) in Sect. 3 and we have two types of \(\widehat{y}\) such that \({\widehat{y}}_{angle\_from\_observe}\) and \({\widehat{y}}_{angle\_form\_MLR}\) for angle comparison. The comparison between two angle estimation is reported in Table 5.
5 Conclusions
It is an important task to estimate ROMs as accurate as possible in rehabilitation exercise. Since the computer vision technology is plausible to apply into detecting human landmarks, it is promising in motion analysis in that cost effective and time saving. However, detecting and measuring the landmarks is not trivial yet. We addressed the way of measuring ROMs facilitating machine learning approach. In this paper, simple MLR is used to learn human elbow angle based on data. We developed how the machine learning approaches can be used and generate training and test dataset. Even though the test cases and test subjects are limited, we think the possibility of our approach could be addressed sufficiently. After implementation and experiments, we showed that machine learning based ROM estimation is possible approach in terms of enhancing the accuracy. If more enhanced algorithm is tested and evaluated in our domain, the better enhancement will be achieved. We leave this as our future research.
References
PhysioTalk. ROM evaluation – Arm. https://youtu.be/jKkuq3DIUkE
ML Kit. Pose Detection. https://developers.google.com/ml-kit/vision/pose-detection
Lugaresi, C., et al.: MediaPipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Cronin, N.J., et al.: Markerless 2D kinematic analysis of underwater running: a deep learning approach. J. Biomech. 87, 75–82 (2019)
Colyer, S.L., Evans, M., Cosker, D.P., et al.: A review of the evolution of vision-based motion analysis and the integration of advanced computer vision methods towards developing a markerless system. Sports Med. Open 4, 24 (2018)
Qiu, Y., et al.: Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomed. Sig. Process. Control 72, 103323 (2022)
Haberkamp, L.D., Garcia, M.C., Bazett-Jones, D.M.: Validity of an artificial intelligence, human pose estimation model for measuring single-leg squat kinematics. J. Biomech. 144, 111333 (2022)
HealthCare. PhysioGait. https://hcifitness.com/collections/products/
AC Mobility. MOTOmed. https://acmobility.com.au/motomed-rehabilitation-exercise-equipment/
BCIT. Introducing the AAPLEwalk. https://www.bcit.ca/applied-research/makeplus-product-development/rehabilitation-engineering-design-lab/aaplewalk/
Tonal. A home gym built to work as hard as you. https://www.tonal.com/equipment/
Google. Mediapipe. https://github.com/google/mediapipe
Acknowledgements
This study was supported by the Translational R&D Program on Smart Rehabilitation Exercises (NCR-TRSRE-Eq. 01A), National Rehabilitation Center, Ministry of Health and Welfare, Korea.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Noh, G., Ahn, J., Jeoung, B. (2023). Enhancing the Measurement of the Range of Motion Using Multi-camera Learning Approaches. In: Jongbae, K., Mokhtari, M., Aloulou, H., Abdulrazak, B., Seungbok, L. (eds) Digital Health Transformation, Smart Ageing, and Managing Disability. ICOST 2023. Lecture Notes in Computer Science, vol 14237. Springer, Cham. https://doi.org/10.1007/978-3-031-43950-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-43950-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43949-0
Online ISBN: 978-3-031-43950-6
eBook Packages: Computer ScienceComputer Science (R0)