Introduction

Torsional disorders of the lower limb refer to a group of conditions characterised by abnormal twisting of the leg bones, typically the femur or tibia. These disorders are a common source of concern for parents and a leading reason for paediatric consultations with healthcare providers. As many as 10% of the paediatric population may present with a torsional disorder [1]. The most common clinical manifestation is an in-toeing gait, whose clinical evolution is often benign and self-resolving. However, in some cases, it can lead to clumsiness, difficulty in walking or cosmetic disturbances [2], and in certain instances, they can alter the biomechanics of gait and cause pathologies such as patellofemoral arthritis and hip dysplasia, among others [3,4,5,6].

The factors that determine the torsional profile of the lower limbs are mainly femoral torsion (FT), and tibial torsion (TT) [7]. These parameters typically stabilize by the age of 9–10 years as the bones mature and rotate externally [8, 9].

Clinical assessment of FT and TT involves various methods, with differing degrees of reliability. FT is commonly measured with the Craig test, also known as the trochanteric prominence angle test, where the external palpation of the greater trochanter is used to assess the rotation [10,11,12]. TT can be evaluated using the posterior surface of the tibial condyles and the transmalleolar axis as reference points. Although these tests can be useful in a clinical setting, their reliability is medium to low [13,14,15] and are not suitable for accurately monitoring torsions during childhood [11, 13, 15,16,17].

Alternatively, imaging tests represent the most accurate way to quantify femoral and tibial torsion [12, 18,19,20,21,22,23,24]. Different imaging-based techniques have been proposed, including computed tomography (CT), magnetic resonance imaging (MRI), fluoroscopy, biplanar radiology and ultrasonography (US).

CT has traditionally been the “gold standard” for assessing torsional deformities due to its high accuracy and reliability [25, 26]. However, the use of CT is limited by its cost, availability and radiation exposure, making it less suitable for repeated examinations, especially in children [27, 28].

MRI is a highly precise and non-radiating technique [25, 29,30,31], but there is limited data on the reliability and reproducibility of MRI-based torsional measurements [29]. Additionally, the interchangeability of MRI and CT for measuring femoral torsion remains controversial, with both methods being costly and time-consuming [32].

Low-dose stereoscopic X-ray with 3D reconstruction is a fast and accurate alternative, providing a full-body biplanar X-ray in less than 20 s [33, 34]. Its use is increasing in paediatrics because the X-ray exposure is 800 to 1000 times lower than that of CT, and it allows an accurate and safer follow-up of the torsional parameters of the lower extremities [24, 34,35,36]. However, its availability in clinical practice is limited due to a high cost and, although lower than CT, it is still a radiating technique.

US is a non-radiating, reliable and accurate alternative for the evaluation of torsional alterations of the lower limb [14, 21, 22, 37,38,39,40], and it is less expensive and less time-consuming compared to CT and MRI. Furthermore, it can be performed in the clinical setting, as it does not require a specialised environment, unlike CT, MRI or X-ray [16, 22]. Different protocols have been described to quantify the femoral and tibial torsion using various anatomical references, showing good reliability in the adult population [22, 37, 41].

Although US may serve as a viable non-radiating alternative for the assessment and monitoring of torsional parameters in children [10, 37], few studies assess femoral and tibial torsion in children, and no systematic review has been previously made on their validity and reliability.

Objective

This systematic review aims to analyse the validity and reliability of ultrasonography for quantifying femoral and tibial torsion in the paediatric and adolescent population. A second objective is to describe the different anatomical references used for the assessment.

Methodology

This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [42]. The protocol was registered in the Prospective Register of Systematic Reviews (PROSPERO) database before starting the database searches (ID: CRD42021290973).

Search strategy

Medline (via PubMed), Web of Science, Scopus and CINAHL databases were searched based on their broad coverage and reputable quality in medical and health-related research. The search is performed from each data-base’s inception to March 2023, using the terms and strategy presented in Table 1. No restrictions were imposed on publication year or language.

Table 1 Terms and search strategy in the different databases

Eligibility criteria

Studies evaluating the validity and/or reliability of ultrasonography for the assessment of femoral and/or tibial torsion in children and adolescents under 18 years of age were included.

Studies in neurologically impaired populations, in artificial or cadaveric models and in animals were excluded.

Selection of studies

Two authors (XR and AC) independently screened the titles, abstracts and full texts of articles based on inclusion and exclusion criteria. Any discrepancies were resolved by discussion between the two reviewers or by a third author (CE) until consensus was reached.

Data extraction

Data extraction was performed using a standardised template and included: lead author of the study, year of publication, country, participant demographics (sample size, sex and age), intervention characteristics and validity and/or reliability results of the methodology under study (Table 3).

Risk of bias assessment

The methodological quality of all eligible studies was independently reviewed by two authors (XR and GP). Standards for Reporting Studies of Diagnostic Accuracy (STARD) [43] and Quality Assessment of Diagnostic Accuracy Studies (QUADAS) [44] were used according to the protocol proposed by Fernandes De Oliveira et al. [45]. Each criterion was assigned with a judgement of ‘yes’, ‘no’ or ‘unclear’. Any disagreement was discussed until consensus was reached or resolved by a third author (CE).

Data analysis

Descriptive statistics (frequency, mean, range and percentage) were used to characterise the participant population and the intervention of the included studies. No meta-analysis could be conducted due to the heterogeneity of data obtained.

The validity is assessed by examining the study design, statistical measures and comparisons with a recognised gold standard [46]. The reliability is evaluated in terms of intra- and inter-observer consistency, using statistical measures to determine agreement among different observers and across repeated trials [46].

Results

Out of 1546 articles identified through the search, 30 articles were considered eligible for full-text screening, and 8 studies were eligible for inclusion in this review (Fig. 1). Using the QUADAS & STARD protocols to evaluate methodological quality [43, 44], five of the eight reviewed articles scored 10 or more ‘yes’ responses on the 15 criteria analysed. Two articles scored between 5 and 10, while only one article scored below 5. This article aimed to analyse the reliability but not the validity against a gold standard (Table 2).

Fig. 1
figure 1

Flowchart for the selection of the studies

Table 2 Results of the assessment based on the 15 criteria, 12 from QUADAS [43] and 3 from STARD [42] for the 8 articles included in this systematic review.

A total of 286 participants were included, and 566 lower limbs were assessed. As for the gender of the participants, excluding Berman et al. [47] which does not provide this data, differences are small, with an average of 51.2% of females. The age range is very similar in 6 of the 8 studies (from 2 to 17 years) with a mean of 8.4 years. Berman et al. [47] does not provide data, and Prasad et al. [48] assessed infants between 0 and 6 months of age. Three studies (both studies from Terjesen [38, 49] and Elke et al.’s [50]) include children and pre-adolescents (3–14 years), while those of Gunther et al. [51], Keppler et al. [52] and Tomczak et al. [53] also include adolescents (2 to 17.5 years old). All participants were recruited from children’s hospital clinics. The main characteristics of the included studies are summarised in Table 3.

Table 3 Characteristics of eligible studies included in this systematic review

All the studies assessed femoral torsion, but only Keppler et al. [52] also included the assessment of tibial torsion. In 7 out of the eight studies, the validity of the methodology with US was analysed: it was compared to CT in 4 studies [47, 51,52,53], to MRI in 2 studies [51, 53], to X-ray in three studies [38, 49, 50] and to clinical goniometry in one study [51]. Two of the studies analysed the validity by contrasting it with more than one technique [51, 53]. Intra-observer and inter-observer reliability were assessed by 4 of the studies [48, 50, 51, 53].

Four proximal and two distal references were used for femoral assessment. The most used proximal femoral landmarks were the anterior aspect of the femoral neck [48, 50,51,52] and the anterior tangent between the head and the greater trochanter [38, 49, 53]. For the distal femur, the most used anatomical reference was the posterior tangent of the femoral condyles [49, 52]. However, the most common protocol was to infer the horizontality of these femoral condyles by positioning the leg vertically outside the examination table with the knee flexed 90° [38, 49, 51, 53]. As for the tibial references, the only study included used the posterior tangent of the condyles and the anterior tangent of the distal epiphysis [52] (Table 4).

Table 4 Valuation methodology and results

Both the highest inter- and intra-observer validity and reliability in the assessment of femoral torsion were found using the anterior tangent between the head and the greater trochanter and the posterior tangent of the condyles as references, assuming the horizontality of the condyles with the knees flexed 90° and the legs vertical placed outside the examination Table [38, 49, 53].

The mean differences between US and the gold standard ranged from 4° to 6.7°, and the correlation coefficients ranged from 0.71 to 0.81 [53]. It was not possible to compare the validity data of the assessment of tibial torsion with US, as it was studied in only one article [52]. The highest correlation coefficients for both intra-observer and inter-observer reliability were 0.88 [53] (Table 4).

Discussion

The data available about the assessment of femoral and/or tibial torsions using US in children have been assessed in terms of validity and reliability, and the proximal and distal anatomical references have been described. Eight studies were considered eligible and were analysed.

Imaging techniques

All the studies assess the FT, but only one of them, Keppler [52], also analyses TT. Four different imaging techniques were used to assess the validity of US. CT is considered the gold standard for the assessment of torsional disorders [29, 36, 54, 55], and it is the technique used in 4 of the eight studies [47, 51,52,53]. MRI [51, 53], X-ray [38, 49, 50] and clinical assessment [51] were also used.

A fundamental difference between US and the other imaging tests is that only the surface of mature bone can be observed. This limitation results in differences in the reference planes utilised for US assessments compared to those for CT and MRI, which can capture cross-sectional images to identify epiphyseal planes. In addition, US cannot simultaneously capture both proximal and distal bone landmarks, necessitating separate acquisitions. This can be a limitation if the subject moves between assessments, potentially affecting the results. To address this, simple, secure and comfortable fixation systems can keep the child’s limb immobile on the examination table, preventing any movement between the proximal and distal acquisition [22, 56]. In contrast, CT, MRI and X-ray allow the simultaneous acquisition of both references.

Anatomical references

Proximal femur

Since the US only allows to observe the cortical surface to assess the proximal femur angle, a plane tangent to the bone surface, commonly the anterior plane, is often used. In contrast, the axis of the femoral neck can be used both by CT and MRI. Thus, most studies refer to “true anteversion” as determined by CT and MRI and “anterior anteversion” as determined by US.

Only Prasad et al. [48] assesses the true anteversion by US, since the studied population comprises children aged 0 to 6 months, in which the ossification of the femoral head is non-existent or very incipient, and thus the entire cartilaginous contour can be seen. However, he concludes that the reliability in the assessment of true anteversion with US is not acceptable at this age. Comparing true and anterior anteversion, Terjesen et al. [38] observes a consistent difference of 5° to 10° higher in the latter. Thus, he suggests a correction factor of 5° in children up to 12 years of age and between 5° and 10° in children over 12 years of age and adults.

Anatomical references in proximal femur

The anterior anteversion can be determined by different anatomical references, and they may influence the reliability, according to Elke et al. [50]. The references used in the selected studies are the anterior head to the trochanter tangent [38, 49, 53] and the anterior aspect of the femoral neck [48,49,50,51,52]. Elke et al. [50] recommends a degree of inclination of the probe over the anterior proximal femur that allows a correct visualization of the intertrochanteric plane and ensures maximum reliability during the measurement. Alternatively, Terjesen and Anda [49] uses 2 proximal references in his study and concludes that in children it is easier to determine the anterior tangent between the femoral head and the greater trochanter than the anterior tangent of the femoral neck, as the latter is too short. Berman et al. [47], the only author who does not recommend the general use of US for the assessment of FT, uses the posterior aspect of the femoral neck as a reference and finds differences greater than 10° compared to CT in 8 of the 19 femurs analysed. The soft tissues between the probe and the bone and the author’s recognition of some estimation of planes may have influenced the low accuracy of the assessments obtained.

Technique with horizontal probe vs inclined probe (tilted transducer technique)

Another feature that influences the reliability is the position of the ultrasound probe on the proximal femur. Berman et al. [47] and Prasad et al. [48] place the probe completely vertically by attaching a spirit level to it. Subsequently, the inclination of the femoral neck is calculated on the image obtained. With this methodology, the accuracy decreases as the torsional value increases, since the lateral area of the femoral neck will move further away from the probe, leading to distortions and measurement errors. To overcome this limitation, Elke et al. [50] proposes a variation of the methodology for large femoral true anteversions: he positions the subject with an internal hip rotation of 40°, thus increasing the parallelism between the femoral neck and the plane of the probe, which is placed completely horizontal, and then, these 40° are added to the value obtained. In his study, Terjesen and Anda [49] proposed a variation of the technique in which the probe is tilted over the proximal femur until it is displayed horizontally on the screen (tilted transducer technique). If an inclinometer is associated with the probe, it will directly provide the degree of inclination of the femoral neck. This is the most widely used technique in subsequent studies.

Distal femur

For the distal femur, also different anatomical references have been used: Keppler et al. [52] and Terjesen and Anda [49] used the posterior tangent of the femoral condyles. Gunther et al. [51], Terjesen et al. [38] and Tomczak et al. [53] used a simpler methodology: the horizontal intercondylar plane is inferred using the tibia as a perpendicular reference. This inference is made by flexing the leg vertically beyond the edge of the examination table so that the verticality of the leg assumes the horizontality of the femoral condyles. Prasad et al. [48] assesses infants (0 to 6 months old) by positioning them in lateral decubitus and with the knees flexed 90°, thus, the vertical represents the intercondylar plane. This inference of planes may detract from the validity of the methodology, as the assumed complete planar perpendicularity may not exist. On the other hand, it is much more repeatable, being useful in the monitoring of the same individual, where the initial error in the inference should not vary significantly in successive measurements.

Tibial assessment

Out of the selected studies, only Keppler et al. [52] analyses the validity and reliability of the tibial torsion. He uses the posterior tangent of the tibial condyles and the tangent of the distal anterior tibial face as references. Similar to the proximal femur, some of the reference axes used by CT and MRI to determine tibial torsion, are not detectable by US, for example the intermalleolar axis.

Several studies evaluate US in the assessment of tibial torsion [14, 21, 22, 39], but to the best of the authors’ knowledge, no study has analysed its reliability and validity in a paediatric population without neurological alterations.

The anatomical references used by the authors for the assessment of femoral and tibial torsion are listed in Table 4.

Validity of US vs other imaging tests

Regarding the imaging tests used to analyse the validity of the US, Berman et al. [47] and Keppler et al. [52] compare it with TC, Gunther et al. [51] and Tomczak et al. [53] compare to CT and MRI, and Elke et al. [50] and Terjesen [38, 49] use biplanar radiography.

Gunther et al. [51], Terjesen [38, 49] and Tomczak et al. [53] conclude that US is a good alternative, endowed with sufficient precision for routine clinical examinations and monitoring of FT in children. However, for the preoperative planning of torsional disorders, the use of CT, MRI or biplanar radiography is preferable due to their higher accuracy. One of the major drawbacks of US assessment of femoral and tibial torsion is that both ends of the bony structure are not visualised simultaneously. Thus, those methodologies that use both proximal and distal reference to determine torsion require the subject to remain completely immobile during the assessment [47, 49, 52]. Keppler et al. [52] discusses a method of assessment with US that incorporates markers on the probe and 3D reference systems, which eliminates dependence during assessment on the subject’s position or movements. This allows him to also indicate US for pre-surgical assessment of both FT and TT. Only Berman et al. [47] does not recommend the general use of US for FT assessment due to the low validity results obtained. To be noted, his study was conducted in 1987; the evolution of US apparatus and the protocols may have increased the validity of the assessment.

Evaluation of inter- and intra-observer reliability

Inter-observer and intra-observer reliability have been analysed by 4 out of the 8 studies selected [48, 51,52,53]. The results were considered good for FT. Gunther et al. [51] and Tomczak et al. [53] obtain identical reliability data for both interobserver (r = 0.88) and intra-observer (r = 0.88). These results are good but lower than the reliability obtained with CT and MRI, with r values greater than 0.95. Furthermore, Gunther et al. [51] also compares it with the reliability of the clinical assessment, which obtains low inter- and intra-observer values; r = 0.47 and r = 0.77, respectively. In the Prasad infant study [48], the reliability of FT assessment with US using anterior anteversion as a proximal reference was clearly superior to the true anteversion. US observation of the entire cross-section of the head and trochanter in infants is unclear, given the lack of sharpness of the posterior face. This leads to an inaccurate determination of the real axis of the femoral neck and a low reproducibility.

Keppler et al. [52] incorporates a 3D reference system to the US that allows a non-motion-dependent assessment of the patient during acquisition. This method obtains better reliability results to CT for both femoral torsion and tibial torsion.

Most of the studies analysed conclude that US is a reliable method for the assessment and follow-up of femoral torsional alterations in children. It is useful due to its immediacy, convenience and safety, as it does not expose the children to ionising radiation. However, the validity and reliability data obtained by MRI, CT and biplanar X-ray are usually superior to US. Thus, MRI and biplanar X-ray would be the methods of choice in the surgical planning of femoral torsional alterations in children, due to their null or low exposure to ionising radiation, respectively. Despite being considered the gold standard, the use of CT is not recommended for follow-up in children due to the high exposure to ionising radiation it represents. Data for TT are only available from the Keppler study [52].

The method proposed by Keppler [52] obtains excellent results in terms of both validity and reliability, only slightly lower than those obtained with CT. Thus, he is the only author who validated US not only for the follow-up of torsional alterations of the lower extremity but also for surgical planning in adults and children. Noteworthy, this study is the only one in which a 3D reference system is used, clearly increasing them.

While some studies report good reliability for femoral torsion, particularly in paediatric populations, there is limited data available for tibial torsion assessment without neurological alterations. Future research should address this gap to provide a more comprehensive understanding of the reliability of US in assessing lower extremity torsional abnormalities in paediatric population.

Strengths and limitations

In this systematic review, all articles were included based on the selection criteria, without any restrictions on the year or language of publication. Except for the paper by Prasad et al., all other articles belong to the last century. This trend highlights the paucity of recent studies addressing the validity and reliability of US in evaluating lower extremity torsional abnormalities in the paediatric and adolescent population, particularly in relation to tibial torsion. Additionally, the age of the selected articles contributes to the observation that many of them present outcome analysis strategies that do not align with current standards.

To improve the precision and effectiveness of our review, we have revised the protocol originally registered with the PROSPERO Systematic Reviews database under ID CRD42021290973. While the initial protocol included participants of all ages to ensure a comprehensive search scope, it primarily aimed to examine children and adolescents. To better align with our research goals, the revised protocol now specifically targets this younger demographic. Furthermore, we have replaced the Downs & Black checklist with the Quadas & Stard protocol by Whiting et al. [44], which better suits the types of studies under consideration.

Conclusions

All the studies included in this systematic review assessed femoral torsion, but only one of them also included the assessment of tibial torsion. Seven out of the eight studies evaluated the validity of the ultrasound (US) methodology by comparing it with CT, MRI, X-ray or clinical goniometry. Four studies examined intra-observer and inter-observer reliability.

The validity and reliability of US in the assessment of lower limb torsions can be classified from acceptable to high depending on the measurement protocol and the anatomical references used.

US has proven to be a valuable tool in the routine assessment and follow-up of femoral torsional alterations in children and adolescent. Its advantages lie in its safety, cost-effectiveness and the immediacy of results, which can be particularly beneficial in a clinical setting where timely diagnosis and treatment are crucial.

However, while US is accurate and reliable enough for clinical assessment of lower extremity torsion disorders, its suitability for surgical planning is controversial. Three authors recommend more accurate imaging techniques, such as MRI and biplanar radiography. Only one author, who used a 3D reference system, achieved sufficiently precise results to consider US a suitable technique for surgical planning.

These findings have significant clinical implications. They suggest a need for a multi-modal imaging approach in managing femoral torsional disorders in children, with US serving as a first-line tool for initial assessment and follow-up and MRI or biplanar radiography being reserved for pre-surgical planning.

Finally, this systematic review highlights the lack of recent studies analysing the validity and reliability of US for assessing tibial and femoral torsion in children and adolescents, especially TT. It is particularly significant in a technique that uses devices that have evolved largely along these years. This underlines an area for future research, which could potentially lead to improved diagnostic and treatment strategies for this patient population.