Introduction

Goniometry employs various measurement tools for the clinical assessment of range of motion (ROM), a fundamental parameter used to evaluate the functionality of human joint movement and mobility1,2. It is essential for diagnosing pathology, monitoring pathology progression, and predicting prognosis in terms of orthopedics and rehabilitation but is not limited to athletic performance3,4. Therefore, goniometric devices must exhibit high validity and reliability with minimal error, and the technique for using these devices should be easily reproducible5. Unfortunately, obtaining precise and consistent ROM measurements has been extremely difficult owing to the considerable complexity of the anatomy and associated movements6. Subsequently, many tools have been developed to measure mobility, ranging from simple visual examination to complex three-dimensional mobility assessment, in order to support the overarching goals of the Sustainable Development Goals, particularly in the areas of Good Health and Well-being. Over time, the development of medical tools for ROM measurement has proceeded in parallel with general developments in technology; examples include the electrogoniometer7; goniometers with short or long arms8; laser projection with a Halo digital goniometer (laser projection used as a goniometric arm)9; photogrammetry software10; digital goniometers11; the Hawk goniometer, a digital-based goniometer with a plastic, parallel-piped sensor and internal gyroscope12; inertial sensors for real-time monitoring13; and smartphone applications (SA), which employs an inertial measurement unit (IMU)-based goniometer14,15. However, any technology that does not provide valid and reliable measurements is not a suitable basis for clinical decisions. Moreover, portability, cost, convenience, and suitability for everyday rehabilitation practice remain a gap for further development of goniometric devices.

To be clinically useful, ROM measurement tools must be confirmed for validity and reliability. Studies on the reliability of equipment for measuring ROM16 have shown the influence of instrumentation, procedures, discrepancy of movement direction, distinction of body parts and different patient types. Peters et al.17 found inconsistency in the reliability of goniometric devices for assessing ROM from one clinician to another; validity and reliability can be impacted by irregularities during measurement—for example, bony landmark positioning, accuracy, consistency of the examiner in establishing the zero point and positioning of the instrument against the target body segment16,18, may all contribute to increasing the risk of error. Human soft tissue and the inability to “see” joint centers and bone are aspects that must be considered in addition to examining the measurement properties of goniometric devices. These significantly impact examiner factors. Obviously, ROM measurement errors stem from three sources: the device, the examiner, and the patient19,20. Ideally, validity and reliability should be transparently investigated. Errors emerging from the equipment should be minimized during the development process.

In the process of developing a ROM measurement device, it is essential to address two of the three primary sources of variability19. First one must address variability inherent in the capacity of the device to quantify angular differences. Second one must accommodate variability arising from the examiner’s skill in using a device to measure angles. Thereafter, human-specific factors will also contribute to measurement variability. A previous study addressed this point by using standard angles to account for the human variability factor, namely Carvalho et al.10. They examined the reliability and reproducibility of goniometric measurements, compared with hand photogrammetry, by using standard angles with a wax hand mold. Volunteer examiners were instructed to position the fulcrum of the goniometer, corresponding to the axes of each joint, according to their clinical experience; then, photographic records were taken for analysis. Wellmon et al.19 examined the concurrent validity and interrater reliability of two goniometer mobile applications, the inclinometer (IC), and universal goniometer (UG)—by applying standardized angles from wooden models. This effectively fixed patient factors that can affect repeated measurements, enabling examination of concurrent validity and interrater reliability relating to examiner skill and the accuracy of smartphone devices and applications for determining angular excursion. Unfortunately, the acceptable degree of reliability exhibited by the equipment and examiner without patient factors has not been clearly described in the literature. This gap subsequently influences the success of the invention of a new goniometric device by limiting the inventor’s ability to proceed to the next step of conducting a study on human joints.

An in-depth analysis of measurement error, originating from the precision of equipment and the expertise of examiners, in the context of both standard joint assessments and human joint measurements, is imperative. Drawing upon the knowledge gained from widely adopted clinical instruments or gold standard can provide valuable guidance for the development of new measurement tools5. Although radiographic measurement has been acknowledged as the gold standard21, it results in unnecessary radiation exposure and cannot necessarily be used reliably to measure changes in ROM9. Meanwhile, UG and IC have been most extensively implemented in clinical settings since the past because of their portability, low cost, convenience, and reasonable validity and reliability22,23,24. Numerous studies have indicated that the intra- and interrater reliabilities of the UG in the assessment of human joint ROM were excellent, with intraclass correlation coefficient (ICC) values consistently exceeding 0.908,23,24. The certainty of the application of UG to clinical practice was reinforced by the fact that the validity of UG compared with that of radiographic measurements was high, as indicated by an ICC value of > 0.9023,24. Although the reliability and validity of IC for measuring human joint ROM varied from poor to excellent, it has been widely used to measure spinal ROM25,26. The digital inclinometer (DI) is portable, accurate, and reliable; therefore, theoretically, it can be applied in practice18,27,28; however, it comes at a higher cost than both the UG and IC29. Recently, our approach to patient management in rehabilitation practice has evolved due to the impact of novel technologies and the use of computer-based applications (apps). A recent systematic review of the validity and reliability of SAs for ROM measurement has sufficiently supported their viability as goniometer substitutes14. Because the IC can be modified (modified inclinometer [MI]) by attaching a fixing apparatus to free the examiner’s hand, reading the scale, stabilizing the extremity, and guiding movement can be accomplished by one examiner30. MI has been used in a particular rehabilitation approach; however, its validity and reliability have been strongly confirmed. In conclusion, the UG, IC, SA, DI, and MI have gained popularity in clinical settings because of their ease of access and compatibility in terms of size and weight, making them convenient choices for diverse applications across different settings. However, note that each device comes with its unique set of advantages and disadvantages, which has led to their selection in different settings (Table 1). Therefore, an analysis of the validity and reliability of these different angular measurement devices constitutes a priority research gap that should be addressed to determine the inherent technical error, which should be taken as a reference while developing any given new device.

Table 1 Comparative analysis of the characteristics of five common clinical goniometric devices.

This study aimed to explore the concurrent validity and intra- and interrater reliabilities of five goniometric devices (i.e., UG, IC, SA, DI, and MI) by focusing on examiner factors and the measurement error of the devices. This study was conducted to provide valuable insights into setting thresholds for measurement error to help inventors and practitioners determine the suitability of new devices for joint angle measurements in human subjects.

Methods

Design

This study employed a descriptive, non-experimental study design. Concurrent validity and reliability were evaluated using the test/re–test method. Five common clinical goniometric devices (i.e., UG, IC, DI, SA, and MI) were employed to measure standardized ROM (ranging from 0° to 180°) and human shoulder joint flexion angles (ranging from 0° to 180°). These measurements were taken by three examiners during two testing sessions. In the concurrent validity study, the UG was selected as the reference standard for comparison because of its widespread use in clinical practice, aligning with the common practice in similar studies18,19,31.

Raters and samples

All three examiners were physical therapists with > 10 years of experience. To standardize the angles measured in the test/re–retest, a testing apparatus was developed to simulate the movement of the shoulder joint, which has the largest arc of movement among human joints (Fig. 1). The apparatus consisted of two arms joined together at one end for the axis of movement. The first arm was slightly curved, mimicking the humerus. The second arm was a straight, stationary arm fixed at one end to the wooden base. The axial end of the straight arm held a circular fitting with 16 holes used to fix the two arms in relation to a specific measurement angle. Twelve angles were set, ranging from 0° to 180°. Each angle was measured for 10 trials; thus, there were 120 measurements in total for each examiner with each device.

Figure 1
figure 1

Example of standard measurement angles and human shoulder flexion angles. Starting and final measurement positions for: (a and b) standard measurement angle; (c and d) human shoulder flexion angle.

During the human joint angle measurement phase, measurements were taken from a group of 20 healthy shoulders, consisting of 10 individuals (5 males and 5 females) with an average age of 23.10 ± 3.25 years, an average weight of 68.70 ± 21.33 kg, an average height of 166.60 ± 6.88 cm and an average body mass index of 24.74 ± 7.49 kg/m2. Each shoulder was assessed at 8 different angles, ranging from 0° to 180°. This resulted in 160 measurements for each examiner using each device.

To assess reliability, measurements of each of the three standardized angles and each of the two shoulder flexion angles were analyzed, ensuring that at least 30 heterogeneous samples were examined32. Groups of three sequences of standardized angles and groups of two sequences of human shoulder flexion angle lying in the same quarter of the semicircle were analyzed, as follows: 1st quarter, 0°–45°; 2nd quarter, > 45°–90°; 3rd quarter, > 90°–135°; 4th quarter, > 135°–180°.

Procedures

The same evaluation conditions were maintained for each examiner at each testing session, encompassing both study phases. Before data collection for each phase, all examiners participated in a practice session to clarify the study procedure and measurement methods for all devices. Three examiners (Researchers B.S, N.L., and W.S.) measured the standardized angles and human shoulder joint flexion angles using each device (i.e., UG, IC, DI, SA, and MI) in a random order. Each standardized angle and each shoulder flexion angle of every participant underwent multiple measurements by each examiner using every designated device. The measurements were performed in two testing sessions, with a 2-week gap between sessions for standardized angle measurements and a 2-day gap for shoulder flexion angle measurements. The assignment and order of the 12 standardized angles and 8 shoulder flexion angles for each participant were randomly determined by Researcher S.W. To blind the examiners to the data recorded, readings were taken by a second investigator (Researcher S.K.) and recorded by an assistant researcher. Whole numbers at 1° increments were recorded.

The process of establishing shoulder flexion angles for all participants was meticulously performed while they were in the supine position (lying face upwards). To maintain precision and consistency in the starting position for each testing instance, markers were strategically placed to delineate the positions of the entire trunk and the testing arm on the bed. This careful approach was instrumental in achieving a uniform starting point for all measurements. Shoulder flexion angles were systematically determined using a polyvinyl chloride (PVC) pipe that featured distinctive markings on both the PVC pipe itself and the bed. This standardization process was diligently supervised by the same assistant researcher for all participants, ensuring that the angle settings were accurate and consistent across the board. During the human testing phase, the specific shoulder flexion angle was meticulously set by the designated assistant researcher. Subsequently, the examiner responsible for the final adjustments and alignment played a critical role in ensuring that the measurement device was precisely positioned before making measurements.

Goniometric measurements

Figures 2 and 3 shows the measurement procedures used for all goniometric devices. To blind the examiner to the readings, the scale, screen or monitor of each device was directed away or covered.

Figure 2
figure 2

Goniometric devices and procedures for measuring standard angle. Starting and final measurement positions for: (a and b) universal goniometer; (c and d) inclinometer; (e and f) smartphone application; (g and h) digital inclinometer; (i and j) modified inclinometer.

Figure 3
figure 3

Goniometric devices and procedures for measuring human shoulder flexion angle. Starting and final measurement positions for: (a and b) universal goniometer; (c and d) inclinometer; (e and f) smartphone application; (g and h) digital inclinometer; (i and j) modified inclinometer.

The UG used in this study was a 12-inch transparent plastic model, specifically the Baseline® Model 12-1000 (Fabrication Enterprises, White Plains, NY, USA), featuring a protractor scale, two arms, and a fulcrum. The IC used was a 180° Baseline Bubble® (Fabrication Enterprises), which operates based on fluid levels. These two devices present a 360° scale with 1° increments. To measure angles using the UG, the examiners positioned the fulcrum of the UG on the axis of the apparatus (Fig. 2a and b) or acromion process of the participant’s shoulder and aligned the UG’s stationary and movable arms to the arms of the apparatus or the participant’s humerus and trunk (Fig. 3a and b). For the IC, the examiners positioned the base of the IC against the two arms of the apparatus (Fig. 2c and d) or the participant’s humerus in two consecutive positions (Fig. 3c and d).

The gyroscope-based goniometer was a Samsung Galaxy Note Fan edition smartphone running the Goniometer Records application (Indian Orthopedic Research Group, www.iorg.co.in/2013/05/goniometer-records-mobile-app/). This application was chosen because it is free on Google Play and quite accurate19,33. During the measurement process, the alignment of the smartphone’s edge with either the arms of the apparatus (Fig. 2e and f) or the participant’s humerus was performed in two consecutive steps (Fig. 3e and f).

In this study, the MicroFET® 3 DI (Hoggan Scientific in Salt Lake City, UT, USA) was used. This device was chosen for its versatility, as it can serve as both a handheld dynamometer and a DI. It is known for its cost-effectiveness and ease of implementation in a clinical setting34. For measurements, the examiner placed the device parallel to the stationary arm of the apparatus or the participant’s humerus at the starting position. The reading angle was recorded when the examiner aligned the device with the movable arm of the apparatus (Fig. 2g and h) or the participant’s humerus at the final position and pressed the “Final Setting” button on the side of the device (Fig. 3g and h).

In this study, the MI employed was a gravity pendulum-based IC originally designed as a low-cost goniometer. Modifications were made to this device, including the addition of an adjustable scale and a gravity pendulum reading scale. Furthermore, a fixing apparatus was used with the inclinometer. This design was proposed in order to free the examiner’s hands for controlling unwanted movements during ROM measurements. During measurement, the device was attached to the patient, allowing the examiner to use their hands to support the patient’s movement. To measure the sample angles in this study, the examiner fixed the device to the movable arm of the apparatus or the participant’s arm and set the zero scale when the movable arm remained in its starting position. To ensure that the examiner was blinded, the readings were observed and recorded by a second investigator as the movable arm of the apparatus (Fig. 2i and j) or the participant’s arm moved into the final position (Fig. 3i and j).

Statistical analysis

Descriptive statistics of the 12 standardized angles and 8 human joint angles measured by all examiners using all devices in both testing sessions were calculated. The ICC values of the two-way mixed model were calculated to describe concurrent validity and inter- and intrarater reliabilities. These analyses were performed separately for the two study phases: standard angle measurement and human joint angle measurement. Inter- and intrarater reliabilities were considered in terms of the ICC as follows: poor, < 0.5; moderate, 0.5–0.75; good, 0.75–0.9, and excellent, > 0.935. As an additional examination of concurrent validity and reliability, the standard error of measurement (SEM) was calculated in relation to the ICC using the following formula: SEM = standard deviation (SD) × √(1 − r)35,36. The SEM is often employed for clinical measurement procedures to avoid intersample variability37. A lower SEM implies greater measurement accuracy. To determine the true changes in ROM (vs. random error), the minimal detectable change (MDC) at the 90% confidence level was calculated using the following formula: MDC = 1.65 × SEM × √237. To reflect the smallest unit of measurement of all goniometric devices, the MDC values were rounded to the nearest degree. The concurrent validity between two measurement devices was described as reasonable validity when the ICC was > 0.9035. Furthermore, agreement and systematic differences between measurement devices were examined using Bland–Altman plots. Differences relative to the range of true measurements were assessed using 95% limits of agreement (95% LOA), calculated as follows: mean difference between devices ± 1.96 × SD35,38.

Ethics approval and consent to participate

All participants signed a consent form before testing. The study was conducted according to the Declaration of Helsinki and was approved by the Ethics Committee of Burapha University under protocol number HS014/2566(C1) and IRB number IRB1-070/2566.

Results

Tables 2 and 3 show descriptive data for each standardized angle and human joint angle, measured using all five devices (i.e., UG, IG, SA, DI, and MI) during both testing sessions. No significant differences in ROM measurements were observed between raters (F = 0.086, P = 0.918, ES < 0.001; F = 0.142, P = 0.868, ES < 0.001), between devices (F = 0.055, P = 0.994, ES < 0.001; F = 0.232, P = 0.921, ES < 0.001), and between testing sessions (F = 0.091, P = 0.764, ES < 0.001; F = 0.188, P = 0.664, ES < 0.001) for both standardized angle and human joint angle measurements.

Table 2 Mean and standard deviation (SD) of the angle measured by five goniometric devices.
Table 3 Mean and standard deviation (SD) of human shoulder range of motion (ROM) measured using five goniometric devices.

In the analysis of concurrent validity, the ICC, SEM, MDC, 95% LOA, and mean of differences between device pairs are shown in Table 4. All device pairs demonstrated ICC values exceeding 0.99 for both standard angle and human joint angle measurements. For measuring standard angles, the three device pairs that included UG, IC, and DI showed an SEM within 1°, MDC within 2°, and 95% LOA between − 2.69° and 3.00°. Device pairs that included SA, MI, and other devices demonstrated a trend toward a greater SEM, MDC, and 95% LOA (0.92°–1.32°, 2°–3°, and − 4.11°–4.04°, respectively). When measuring human joint angles, device pairs that included UG, IC, SA, and DI showed an SEM within 3°, MDC within 8°, and 95% LOA between − 10.98° and 8.41°. In contrast, device pairs that included MI and other devices tended to have higher SEM, MDC, and 95% LOA values (4°, 7°–9°, and − 10.38°–11.38°, respectively). The Bland–Altman plots of each pair, demonstrating their scatter, are shown in Figs. 4 and 5.

Table 4 Statistical summary of agreement of all goniometric measurement devices.
Figure 4
figure 4

Bland–Altman plots of universal goniometer, inclinometer, digital inclinometer the and smartphone application when measuring standard angle. Bland–Altman plots comparing: (a) Universal goniometer versus Inclinometer; (b) Universal goniometer versus Smartphone application; (c) Universal goniometer versus Digital inclinometer; (d) Inclinometer versus Smartphone application; (e) Inclinometer versus Digital inclinometer; (f) Smartphone application versus Digital inclinometer; (g) Universal goniometer versus Modified inclinometer; (h) Inclinometer versus Modified Inclinometer; (i) Smartphone application versus Modified inclinometer; (j) Digital Inclinometer versus Modified inclinometer.

Figure 5
figure 5

Bland–Altman plots of universal goniometer, inclinometer, digital inclinometer and smartphone application when measuring human shoulder flexion angle. Bland–Altman plots comparing: (a) Universal goniometer versus Inclinometer; (b) Universal goniometer versus Smartphone application; (c) Universal goniometer versus Digital inclinometer; (d) Inclinometer versus Smartphone application; (e) Inclinometer versus Digital inclinometer; (f) Smartphone application versus Digital inclinometer; (g) Universal goniometer versus Modified inclinometer; (h) Inclinometer versus Modified Inclinometer; (i) Smartphone application versus Modified inclinometer; (j) Digital Inclinometer versus Modified inclinometer.

Interrater analysis for all measurement devices suggested excellent reliability for each standardized angle and the overall ROM (ICC between 0.980 and 0.999). DI showed the lowest SEM (0.61°–1.05°) and MDC (1°–2°) for each standardized angle. UG and IC had an SEM within 1.48° and MDC within 3°. SA and MI showed a trend toward lower reliability, with a greater SEM (0.59°–1.75°) and MDC (1°–4°) than the other devices. All devices, except for DI, tended to have lower interrater reliability (with a greater SEM and MDC) under wider ROM conditions (Table 5). For human joint angle measurement, all measurement devices exhibited varying levels of interrater reliability across all joint angles, ranging from moderate to excellent (ICC between 0.697 and 0.975; SEM between 1.93° and 4.64°; and MDC between 5° and 11°). All devices exhibited lower interrater reliability when measuring wider ROMs, particularly in the fourth quarter of joint angles, showing moderate reliability (ICC between 0.680 and 0.744; SEM between 3.46° and 4.64°; and MDC between 7° and 11°) (Table 5).

Table 5 Interrater reliability metrics.

Analysis of intrarater reliability (Table 6), with each examiner and overall, demonstrated that all devices had excellent reliability (ICC = 0.977 to > 0.999) for each standardized angle and overall ROM. The DI had the lowest SEM and MDC for each standardized angle (0.56°–0.90° and 1°–2°, respectively), whereas MI had the highest SEM and MDC (1.04°–1.91° and 2°–4°, respectively). The UG, IC, and SA had SEM values within 1.43° and MDC values within 3°. All devices, except for DI, tended to have lower intrarater reliability (with greater SEM and MDC values) under wider ROM conditions. For human joint angle measurement, all measurement devices exhibited varying levels of reliability across all joint angles, ranging from moderate to excellent (ICC between 0.660 and 0.996; SEM between 0.77° and 4.06°; and MDC between 2° and 9°). All devices exhibited lower intrarater reliability when measuring wider ROM, particularly in the fourth quarter of joint angles, showing moderate reliability (ICC between 0.660 and 0.842; SEM between 2.95° and 4.06°; and MDC between 7° and 9°).

Table 6 Intrarater reliability metrics.

Discussion

The present study is the first to explore measurement errors, considering both device and examiner factors, with and without human factors. We conducted a thorough examination of measurement error using five goniometric devices, covering a range of available ROM from 0° to 180° across 12 standard measurement angles and 8 human shoulder joint flexion angles. Our findings can serve as reference values for the development of goniometric equipment, both before and after conducting studies on human joints, while also considering errors from the equipment and examiner objectives.

As a primary objective, we conducted a detailed assessment of concurrent validity to investigate the impact of technology-based device designs on examiner performance. We compared four common measurement devices with UG, a standard clinical tool, across two phases: standard angle measurements and human joint angle measurements. Our analysis in both phases yielded ICCs values exceeding 0.99 for all device pairs, demonstrating their reasonable validity35. Additionally, the Bland–Altman plots for each device pair displayed even dispersion along the x-axis, with mean differences ranging from − 0.97 to 1.08 for the standard angle measurement phase and − 1.59–1.39 for the human joint angle measurement phase. These results suggest that the differences between the two instruments are consistent and not significantly different39. In the standard angle measurement phase, our findings indicated consistency among each device pair. This aligns with the findings of prior research, confirming the potential of technology-based devices to replace UG without introducing significant variability19,40,41,42. However, when measuring human joint angles, notable discrepancies among the devices were observed, highlighting the substantial influence of technology-based device designs on examiner performance in complex scenarios. Of particular significance, both SA and DI stood out because of their utilization of higher-precision embedded technology, which eliminates the need for examiners to read scales or maintain a final position for scale reading. This unique feature sets them apart from traditional measurement tools, such as IC and MI, significantly contributing to their superior performance in measuring human shoulder joint angles18,41,43. In conclusion, our study highlights the potential of technology-based devices, particularly SA and DI, in replacing UG and improving measurement accuracy, particularly in complex scenarios, such as measuring human shoulder joint angles. These findings underscore the critical role of device technology and design in examiner performance. To enhance precision and reliability in clinical measurements, considering these factors when selecting tools is crucial. Additionally, our insights suggest that the development of new goniometric devices with features eliminating the need for reading scales or allowing for fixed final scores for later reading could substantially reduce measurement errors in various research and clinical settings.

For the secondary objective, our reliability analysis consistently revealed excellent inter- and intrarater reliabilities (ICC > 0.90) for all standardized angles and the first three quadrants of the human shoulder flexion angle. However, in the last quadrant, reliability ranged from moderate to good levels, for both intra- and interrater assessments35. We consistently observed an increasing trend in both intra- and interrater reliabilities as the measurement angles widened. This trend remained consistent across both phases for all devices, except for DI, when measuring standard angles. Notably, this trend became more pronounced when measuring human joint angles. This aligns with the common understanding that measuring human joint angles involves a complex interplay of factors, including device, examiner, and individual-specific factors, resulting in greater measurement variability. These findings parallel the outcomes of our concurrent validity study, which revealed larger and more dispersed mean differences among device pairs at wider angles, particularly in the fourth quadrant of human joint angles. Studies have typically focused on measuring the entire ROM for each joint direction. Note that although the study by Handcook (2018)8, a frequently cited literature source, measured three angles of the knee joint, it did not report reliability values for each angle separately. This divergence poses a challenge when comparing our findings with those of previous studies. Furthermore, Wellmon et al. (2016)19 explored interrater reliability for standardized acute, right, and obtuse angles. They reported differences in means for measurements performed using SA, suggesting the potential for clinically meaningful differences to arise when measuring angles > 90°, although they could not provide further clarification. Our results support the findings of Wellmon et al., as four of the goniometric devices exhibited the same trend, except for DI. This trend can be attributed to the alignment of the goniometric device’s reference part. Notably, we observed that the reference part tended to shift more when the final position significantly deviated from the starting position. This shift primarily occurred due to substantial alterations in soft tissue tension during closely end-range motion, causing changes in arm shape and consequently affecting reference part alignment. It is imperative to highlight that our study uniquely addressed reliability at various joint angles, encompassing both standardized and human joint angles. However, this trend was not observed when using DI to measure all standard angles. This deviation may be because of the scale-free reading function and the wider width of the DI reference base, which makes it easier to align by placing it on the surface of the apparatus arms at all angles. The clinical implications of this finding suggest that when measuring joint angles across a wide range, it is critical to reconfirm reference part alignment for consistency with the starting position, particularly during significant posture changes in end-range motion. These insights hold promise for enhancing the accuracy and reliability of joint angle measurements in clinical and research applications, aligning with the goal of accurately reflecting clinical changes, such as treatment effectiveness or the progression of a condition.

No previous study has reported reference values for instrument-focused measurement error corresponding to common goniometric devices (ICC of inter- and intrarater reliabilities, concurrent validity, SEM, MDC, and 95% LOA). Such reference values are necessary for non-experimental studies on the development of novel prototype goniometric devices. Our report concurred substantially with the findings of previous studies. Chapleau et al.23 examined the reliability and validity of UG compared with those of radiography for ROM measurement of healthy elbows. Regarding concurrent validity, a 95% LOA of ± 10.3 (or less) was reported. The ICC for the interrater reliability of UG ranged from 0.95 to 0.97. Wellmon et al.19 studied concurrent validity and reliability by focusing on device and examiner factors and excluded patient factors. They reported an ICC of 0.999 for the concurrent validity of UG and IC, with a 95%LOA ranging from − 3.8 to 3.5. The interrater reliability of UG and IC was also excellent (ICC > 0.99). Hancock et al.8 examined the accuracy and reliability of five knee goniometric methods by supporting the limb to maintain knee angles during measurement. They reported excellent intrarater (ICC > 0.98) and interrater (ICC > 0.99) reliabilities, with the minimum significant differences ranging from 6° to 14°, for both short- and long-arm and laser projection-based digital goniometers. Kolber and Hanney30 reported the interrater reliability of IC for identifying posterior shoulder tightness. Excellent reliability (ICC = 0.90) with an MDC of 9° and SEM within 4° was reported. UG, IC, and DI are commonly used in clinical practice16 and have been recommended as the gold standard by numerous studies8,18,19,29,31,44. Therefore, measurement error metrics based on these three devices can be recommended as reference values. In the light of our findings, it can be concluded that ICC values for inter- and intrarater reliabilities should be > 0.90, SEM should not exceed 2°, and MDC should not be greater than 3°. In terms of concurrent validity, UG and IC set the reference device; ICC values should be > 0.90, SEM should not be greater than 1°, and 95% LOA should range from − 3° to 3°; these criteria can set the error limits for measuring standardized joint angles in non-experimental studies of goniometer prototypes.

In the development of new goniometric devices, extending accuracy testing to include human joint measurements after assessing known angles is essential. This is important because of the variability among individuals, which can have a substantial impact on both the device’s performance and the examiner’s accuracy. Our findings showed that wider joint angles led to increased measurement errors, especially in human joint measurements. This is because of the complex interplay of factors, including tissue tension, changes in limb shape, and misalignment from the starting position. Considering these factors and the specific characteristics of each device, we must analyze the sources of measurement error, discuss control methods and furthermore make recommendations for developing more accurate clinical goniometric devices. Incorporating considerations of intra- and interrater reliability and concurrent validity in human joint measurements from our findings is crucial.

UG demands a high level of examiner skill and involves scale reading, although it does not require holding the final position for immediate scale reading (the final score can be fixed and read later). It necessitates aligning three anatomy points: the axis and both the stationary and movable arms, which places a premium on detailed anatomical identification. However, this feature is advantageous when realigning the zero-starting position upon reaching the final ending position. Although this characteristic presents minimal challenges when measuring standard angles because of their clear and easily definable axes and arms, it poses difficulties in measuring human joints, particularly in large joint angle quadrants where defining the axis and reference body parts becomes more intricate. Although, the fluid level inclinometer in this study requires scale reading and stabilization of the final position for scale reading. Nevertheless, it has a short reference base, which is contrary to previous studies that indentified the positive effect of extending the goniometer arm on measurement accuracy8,45.

The DI demonstrated superior validity and intra- and interrater reliabilities when used to measure standard angles. However, it did not exhibit the same level of superiority when measuring human joints. The DI wide reference base width facilitated its deployment on standard angle arms but did not yield a similar positive effect when measuring human joints. On the other hand, the DI short reference base and large size made alignment more challenging, particularly when measuring human joints near the end range. Contrary to previous research findings27,28, our study showed that the DI's reliability for ROM assessment was lower compared to the UG. However, our findings provided greater validity and reliability than those of Kolber et al.18, who examined the reliability and concurrent validity of shoulder mobility measurements using a DI compared with those of shoulder mobility measurements using a UG. They reported an SEM of 2° and a LOA of ± 11°, which are reasonable values for patient measurements. Our MDC values also achieved improved accuracy relative to those reported by Mohammad et al.29, who noted MDC values ranging from 1.45° to 11.89° when assessing ROM in lower extremity joints. A direct comparison is however challenging due to differences in angle sources, study populations, and the use of a different specific model of the DI.

When measuring standardized joint angles, MI exhibited higher measurement errors than the other devices. However, their ICC values for concurrent validity exceeded 0.90, indicating excellent interrater and intrarater reliabilities, which are generally considered acceptable35. MI exhibited slightly inferior concurrent validity and intra- and interrater reliabilities when measuring both standardized and human joint angles. This could be attributed to the need for scale reading and holding the final position for scale reading. In contrast, MI only required an initial reference setting (zero starting) and then reading the scale at the final position, which limits adjustments to the final alignment. Additionally, this difference in performance might be related to partially unstable fixation between MI and the measurement apparatus. Body shape changes occur beneath the fixing apparatus because of the tension of the surrounding soft tissue. This differs from that shown in a previous study that measured neck movement and applied a fixing apparatus (tape) around the head, where there was less significant shape change during measurement42. Clinically, applying the fixing apparatus to areas with minimal shape changes, such as bony prominences, is advisable to ensure more stable measurements.

For standardized angle measurements, SA showed slightly decreased reliability, which is consistent with the findings of prior research highlighting design-related variability, particularly due to rounded edges. This finding agrees with that reported by Wellmon et al.19, who investigated the concurrent validity and interrater reliability of the Goniometer Record and Goniometer Pro applications installed on various smartphones for measuring standardized angles. They considered UG and IC as the reference standards. Their study revealed ICC values for concurrent validity (using both applications) exceeding 0.99 and 95% LOA within ± 4.05°, indicating strong agreement. Interrater reliability was excellent, with an ICC exceeding 0.99. They emphasized the influence of smartphone design on reliability, particularly when placing the smartphone’s edge against a flat testing apparatus surface. When measuring human joint angles, SA exhibited excellent concurrent validity, with an ICC exceeding 0.90, SEM within 3°, MDC within 7°, and 95% LOA ranging from ± 10°. Furthermore, it demonstrated impressive intrarater reliability, with an ICC exceeding 0.90, SEM within 4°, and MDC within 7°, and strong interrater reliability, featuring an ICC exceeding 0.90, SEM within 5°, and MDC within 9°. These findings in the present human study highlight the superior validity and reliability of SA compared with those of other devices. This can be attributed to the high-precision technology embedded46 and the technique employed, which aligns smartphone reference lines with humerus positioning, effectively mitigating variations caused by nonflat surfaces. Several factors likely contributed to these excellent results, including the absence of scale reading, the capability to establish references twice (initially at the zero starting position and later at the final position, with the option to adjust alignment in both instances), and the extended length of the smartphone's edge (long side), which enhanced alignment with the humerus2.

In a study by Ockendon and Gilbert47, the validity of a novel smartphone accelerometer-based goniometer was assessed, examining 5°–45° of knee flexion deformity compared with a standard Lafayette goniometer. They reported that 95% LOA was ± 7.6°, indicating good agreement. However, earlier studies48,49 have reported varying levels of validity and reliability when using Android and iPhone applications to measure cervical ROM among healthy participants, ranging from poor to excellent. Chapeau et al.23 conducted a noteworthy study on radiographic elbow measurements, reporting interrater ICC values ranging from 0.98 to 0.99. They recommended that a clinically acceptable maximal measurement error should not exceed 10°. In conclusion, both a gyroscope-based smartphone application (using the Goniometer Records application) and a modified gravity pendulum inclinometer (IC) with a fixing apparatus proved suitable for measuring the feasible range of motion in clinical practice. However, when developing new clinical goniometric devices aimed at challenging validity and reliability, SA should be considered a reference device with its unique set of challenges under human joint testing phase.

The limitation of our study is that it focused solely on measuring shoulder flexion in one direction. Future studies should consider measuring motion angles in other directions and examining joints with pathological conditions. Additionally, following this, elegant finite element studies may be conducted using the data extracted to assist in developing clinically more accurate numerical simulations for bioengineering.

Conclusions

Our study provides insights into the capabilities of three examiners to accurately use five commonly used clinical goniometers (i.e., UG, IC, SA, DI, and MI), focusing on device and examiner factors, considering their impact with and without human-specific factors in order to derive reference values for error quantification and clarify what objective applies when developing a new device for measuring ROM. Testing should start with an examination of known standard angles. We recommend that the ICC of reliability should be greater than 0.90, the SEM should be less than 2°, and the MDC should not be greater than 3°. The most accurate and reliable goniometric measurement devices, in terms of all error metrics, were DI for standardized angle measurements and SA for human joint angle measurements. When developing a new clinical goniometric device and challenging its validity and reliability, DI and SA should be considered as reference devices for testing standardized angles and human joint angles, respectively. For standardized joint angles, concurrent validity should meet the criteria of ICC greater than 0.90, SEM less than 1°, MDC within 2°, and 95% LOA within ± 3°. For human joint angles, concurrent validity should adhere to the criteria of ICC greater than 0.90, SEM less than 3°, MDC within 7°, and 95% LOA within ± 10°. Factors, such as the absence of scale reading, the inclusion of a fixing final scale function and ensuring a sufficiently long reference part may play crucial roles. Moreover, we found dissimilar inter- and intrarater reliabilities with varying ROM measurements. We suggest that the concurrent validity and reliability of goniometric prototypes should be studied using all available ROM measurements.