Background

The medial longitudinal arch is a complex structure consisting of static and dynamic components like muscles, bones, tendons, and ligaments [1, 2]. It has important functions in bipedal gait, like supporting and absorbing foot impact during running or walking [3]. It develops with a broad spectrum of normal variations primarily in the first decade of life [4]. The pathological increase or reduction of the medial longitudinal arch known as pes cavus or pes planus has been identified in previous literature as potential risk factors for acute injuries and chronic problems, such as muscle imbalances, pain, functional impairment, or gait alteration in later life [5,6,7,8,9]. An “untypical form” of the medial arch in a child, noticed by the parents, is one of the leading causes for consultations of pediatric orthopedic specialists [4].

A classification of the medial longitudinal arch morphology is essential to differentiate between various foot types [10]. Clinically identifying foot types in an early stage of life may help to estimate the need for potential interventions. In order to treat patients with foot-related problems or to assess the foot arch changes during children’s growth, an accurate and reliable diagnostic system is mandatory [11]. Frequently used assessments of the medial longitudinal arch include clinical observations, radiographic analyses, anthropometric measurements, and footprint assessments via digital photography or pedobarography [12,13,14,15,16,17,18]. However, to date, none of these measures is generally accepted as the “gold standard” [19]. This may be partly due to the lack of standardized classification systems for foot shape or deformities [12]. Due to its simplicity and the ability to characterize the foot arch in a fast, safe, reliable, non-invasive, and inexpensive way, footprint measurements have been often described as the method of choice [3]. Commonly used objective measurements to characterize the medial longitudinal arch are footprint indices like the arch index, staheli index, Chipaux-Smirak index, truncated arch index, footprint index, or alpha angel to characterize the medial longitudinal arch [19,20,21,22,23].

Another efficient way to assess foot arch characteristics is using static anthropometrical measurements like foot length, dorsum height, or navicular height [24]. In clinical settings, clinical examination performed by a physician is a widely used procedure for the assessment of the foot [11].

To our knowledge, no study ever examined the inter-rater agreement of clinical assessments of the medial longitudinal plantar arch in children or adolescents whereas a few studies reported conflicting results for the adult population [11, 25, 26]. According to Cowan et al. [25], the clinical rating seems to be influenced by the experts’ experience, clinical specialization, or the foot morphology itself. It is furthermore unknown whether the abovementioned objective foot arch assessments are in agreement with clinical expert ratings.

Due to these research deficits, the aim of this study was to examine the inter-rater agreement of clinical assessments of the medial longitudinal plantar arch in children.

Furthermore, the relationship between the subjective clinical foot rating and objective static and dynamic foot arch measurements was determined.

Methods

Participants

Participants were recruited in a summer sports camp during school holidays. Children between 5 and 13 years of age with the ability to walk independently were included. Exclusion criteria were foot deformations such as pes equinovarus or fixed pes equinus, operations to the musculoskeletal system in the past 6 months, or children with acute injuries of the lower extremity. The final sample consisted of 74 children, 34 girls and 40 boys, with a mean ± SD age of 9.0 ± 1.81 years (range, 5–12 years). The mean ± SD height was 1.39 ± 0.14 m (range, 1.12–1.75 m), the mean ± SD weight was 32.2 ± 10.9 kg (range, 20.1–74.9 kg), and the mean ± SD body mass index (BMI) was 17.3 ± 2.4 kg/m2 (range, 13.6–29.7 kg/m2). The parents signed an informed consent form and the ethical commission of the Medical Chamber of Hamburg approved the study (PV4971).

Clinical Expert Rating

The clinical foot arch ratings were completed by eleven foot experts consisting of nine pediatric orthopedic surgeons and two experienced physical therapists. One part of the experts was randomly selected from an online orthopedic surgeon database and recruited via e-mail invitation. The other part of the raters were invited to participate in this study during an orthopedics conference. To reduce bias, experts were blinded about the study’s objective. Inclusion criteria for experts to participate in the study were active practice in a clinical setting, a specialization in orthopedics, and at least 5 years of professional experience. Physicians and physical therapists who did not practice actively or had minor expertise were excluded. The mean ± SD professional experience of the foot experts was 10.8 ± 5.9 years (range, 5–21 years). Every foot expert gave written consent to participate in this study.

A series of photographs were used for the clinical experts’ rating of the foot arch.

Additionally, one photograph was taken of each child standing only on the tip of the toes of the right foot (Fig. 1). Photos of every participant’s right foot were taken during upright standing with a digital camera (Nikon D3100, resolution 14.2 MP) and a standardized recording setup with fixed camera positions. The pseudonymized photographs were uploaded in a file hosting service. Upon consent to participate in the study, the experts received access to the photographs.

Fig. 1
figure 1

Example of photographs of the participant‘s right foot

For clinical foot assessment, experts were asked to complete a questionnaire with a categorization of the participant‘s feet in high, normal, and low arched.

Static and Dynamic Foot Arch Measurement

For the static foot arch measurements, a specially constructed foot platform (Fig. 2) was used to measure heel-to-toe length (HTL) and dorsum height at 50% of HTL (DH) in sitting and standing positions. The static arch height index (AHI) was calculated by dividing the DH by the HTL [16, 24].

Fig. 2
figure 2

Platform for measuring dorsum height and heel-to-toe length

Dynamic foot dimensions were acquired with a pedobarographic device (Emed®-n50 platform, Novel GmbH, Munich, Germany). This pedobarographic device has a sensor resolution of 4 sensors/cm2 on an area of 475 × 320 mm, which makes a total of 6080 sensors on the platform (Fig. 3).

Fig. 3
figure 3

Example of a digital footprint for dynamic foot arch measurement via pedobarography

The recording frequency was set at 50 Hz. The platform was embedded in a wooden walkway, with a total area of 600 × 3600 mm, in order to level the platform to the ground. The dynamic measurement was acquired with a two-step protocol that was shown to be a reliable procedure [27]. For familiarization, the children were asked to walk a few times across the walkway with their usual walking speed. The instructions followed a predefined protocol. The children were told to walk as they would usually walk, looking up straight, and using a self-selected, comfortable speed, and cadence. An individual marker was placed on the ground where the children had to start, in order to place the second step on the platform. The children were asked to walk in both directions until three trials for the left foot and the right foot were captured. If participants targeted the platform, stepped on the border, or altered their gait, the trial was excluded and the measurements were repeated. If there were four or more correct measurements for one foot, the first three steps were used for the data analysis.

The dynamic arch index of the Emed® system was calculated with an algorithm that masks the foot into three regions—forefoot, midfoot, and hindfoot. The area of the midfoot divided by the area of the whole foot (excluding the toes) is defined as the arch index [17, 20].

Statistical Analysis

All data were exported to a spreadsheet to form a compatible file for STATA Version 14.0 (StataCorp. LP, College Station, TX, USA). The inter-rater agreement of the eleven rates with respect to the children foot arch categories was described with the Fleiss kappa statistic. In addition to adjusting for the different prevalences of the foot arch categories, prevalence-adjusted and bias-adjusted kappa (PABAK) were calculated [28]. The combined kappa is the appropriately weighted average of the individual kappas. Two models were used in order to quantify the agreement between objective and subjective assessments of foot arches. First, sensitivity and specificity of objective measurements (dynamic and both static measures) for the identification of low, normal, and high arched feet (according to clinical experts) were estimated using ROC curves and the area under the curve (AUC). ROC curves represent the respective true positive rate (sensitivity) and true negative rate (specificity) obtained by a model at different thresholds. Second, a mixed logistic regression model and calculation of median odds ratio was used to compare the diversity of clinical experts regarding ratings in every single case [29]. All three predictors, namely the standing and seated arch height index as well as the dynamic arch index, were included as a fixed effect. Additionally, both clusters, e.g., the rater and foot were modeled as random crossed effects. In order to test the significance of the predictor, both models were compared with each other using the likelihood-ratio test.

Results

Among all ratings of all experts, high arched feet had a prevalence of 8.11% while 56.63% of feet were classified as normal arched and 35.26% as low arched. The static AHI had a mean ± SD of 0.276 ± 0.02 (range, 0.228–0.344) in the sitting position and 0.243 ± 0.017 (range, 0.200–0.296) in the standing position. The dynamic arch index assessed with the Emed®-n50 platform had a mean value ± SD of 0.175 ± 0.068 (range, 0.015–0.324) (Table 1).

Table 1 Distribution of answers regarding the prevalence of high, normal, and low arched feed among the 74 children

Rater Agreement

The distribution of ratings among the eleven clinical foot experts for the 74 children is shown in (Fig. 4). Fair agreement was found for high (Fleiss kappa = 0.228) foot arch; adjusting for the small prevalence of a high foot arch kappa increases to a substantial agreement (PABAK = 0.770). For normal arched feet, both agreement measures show a fait agreement (Fleiss kappa = 0.366, PABAK 0.3776). The classification of low arched feet reached moderate agreement between experts (Fleiss kappa = 0.547, PABAK = 0.586)). The combined kappa statistic for all three foot types shows borderline moderate agreement (Fleiss kappa = 0.422, PABAK 0.525). For all agreement measures, the hypothesis that the raters are making their determinations randomly can be rejected (p < 0.001 each).

Fig. 4
figure 4

Pattern score showing the ratings of all experts for all participants

Relationship Between Experts’ Rating and Objective Foot Arch Measurements

The dynamic arch index shows the highest agreement with the foot experts’ ratings for the ROC curves for low arched vs non-low arched feet (AUC = 0.68) and normal vs low arched feet (AUC = 0.67). The highest agreement between the standing arch height index and foot experts’ ratings was found for high arched vs non-high arched feet (AUC = 0.64) and normal vs high arched (AUC = 0.60). The seated arch height index shows the lowest agreement with foot experts’ ratings (AUC = 0.48–0.62). Consequently, the best agreement between objective measurement and clinical observation for all three measurements combined is obtained for low arched feet (Fig. 5).

Fig. 5
figure 5

(a–c) ROC curves for dynamic arch index, seated, and standing AHI in relation to expert ratings.

Additionally, ordinal logistic regression analysis shows median odds ratios of 2.48 for the diversity of clinical experts regarding ratings in every single case as well as 12.87 for the diversity of foot ratings among clinical experts.

Discussion

Rater Agreement

The clinical foot arch assessment seems to be influenced by the experts’ experience, clinical specialization, or the foot morphology itself [25]. Previous research only examined the inter-rater agreement in the adult population [11, 25, 26, 30]. Therefore, the aim of this trial was to test the inter-rater agreement in a pediatric population that is known to have a broad spectrum of variation.

The results of this study show borderline moderate to fair agreement between raters for the clinical assessment of the medial longitudinal arch in children. Whereas good agreement between raters was reached for low arched rated feet, high and normal arch rated feet showed only fair agreement. Therefore, clinical foot type ratings of experts without the use of objective methods seem to be insufficient. These results are in agreement with Chuckpaiwong et al. [11] who indicated the need to measure multiple parameters to clinically classify foot types.

Previous studies on the reliability and validity of existing foot rating systems solely focused on the adult population. Dahle et al. [26] tested the inter-rater reliability of foot arch rating among three physical therapists who had to classify feet in high, normal, and low arched. The results show high reliability with an agreement in 55 out of 77 ratings (71.4%). Another study of Cowan et al. [25] analyzed the agreement among four surgeons and two podiatrists using photographs. They reported poor agreement between raters. A different study by Chuckpaiwong et al. [11] reported 80% agreement for clinical foot arch rating among 147 surgeons using visual assessments of photographs. The most recent study by Terada et al. [30] examined the inter- and intra-rater reliability as well as the between rater agreement using the five image-based criteria form the Foot Posture Index. Excellent intra-rater but only poor to moderate inter-rater reliability was reported. Also, the classification of foot posture did not improve the amount of agreement between raters. However, the methodological differences and variations in sample size and the number of experts make it difficult to compare the results of these studies. The quality of ratings may have also been influenced by the different professions such as surgeons, podiatrists, or physical therapists and the level of training. Additionally, the procedures of the clinical assessments included visual clinical examinations by the experts [26] or rating of photography series [11, 25]. One could also speculate that the quality of the photographs differed between studies (e.g., usage of the mirrored foot photo box [11] vs. digital camera in our study).

Lastly, the described studies by Cowan et al. [25] and Dahle et al. [26] had high risk of bias. The experts were not blinded to the outcome measures. The experts participated in training sessions for foot type assessment in which they were allowed to discuss the rating procedure prior to the actual test. In summary, the literature on the agreement between clinical experts regarding the classification of foot arches in adults is inconclusive and differences in methodics complicate comparability. While two studies [11, 26] report good agreement, two other studies [25, 30] found only poor to moderate agreement. Our results suggest that the rater agreement in a pediatric population might be even poorer. With this being the first study on this subject, further research with similar protocols are warranted to confirm these findings.

Relationship Between Experts’ Rating and Objective Foot Arch Measurements

The ROC curves and AUC (ranging from 0.4811 to 0.6828) indicate poor agreement between expert’s foot arch rating and the dynamic and static arch measurements. The dynamic arch index showed highest agreement for the ROC curves for low arched vs non-low arched feet and normal vs low arched feet comparisons while the standing arch height index showed the highest agreement for the ROC curves for high arched vs non-high arched feet (AUC = 0.6446) and normal vs high arched (AUC = 0.5958).

Considering that the photographs for the clinical rating were taken in static conditions, it is interesting that the dynamic footprint measurements showed partly better agreement than the static seated and standing arch height index. This questions the validity of static measurements for clinical foot arch rating in children. Previous studies regarding the relationship between static and dynamic foot parameters already revealed that both measurements may assess different aspects of foot morphology. Therefore, purely static evaluations of children’s arches may be diagnostically less conclusive. Chang et al. [31] showed moderate correlations between the static foot arch volume index and dynamic measurements (contact areas under the entire foot) in children. Another study by Teyhen et al. [32] showed that a multivariate model generated by plantar parameters during gait was able to predict 60% of the variability in static arch height in adults. However, a recent study by Scholz et al. [33] showed low agreement between dynamic arch index measured during gait and the static arch height index in children measured in seated (r = − 0.070) and standing (r = − 0.138) positions. The authors attributed the conflicting results to the fact that foot characteristics in children are still developing [33], with most of pediatric flat feet being flexible, i.e., the medial longitudinal arch is present at non-weight-bearing and disappears during weight-bearing [4]. It could be therefore speculated that the static and dynamic foot arch measurements are influenced by this mechanism. Overall, in agreement with previous studies, our findings show that currently, no “gold standard” exists for clinical rating of the foot arch. Further research should focus on validating foot arch measurements in children. A combination of objective and subjective measurements seems to be a promising approach. Additionally, distinction between flexible and rigid flat feet needs to be considered, because of their biomechanical and structural differences as well as their clinical implications.

In our study, the best agreement between raters and objective measurements was found for the low arched feet. Considering that the inter-rater agreement was also highest for low arched rated feet, it seems that low arches are easier to identify for the clinical experts. Nevertheless, the low agreement between raters is likely to influence the relationship of the clinical assessment with the objective measurements. In previous literature, only one study [11] examined the correlation between clinical foot assessments and quantitative foot type measurements. Although the results showed good correlation between subjective and objective measurements ranging from r = 0.662 to 0.780, no dynamic footprint measurements were included in this study. Contradicting to the results of the present study, the anthropometrical measurements, like the arch height index, showed always a better agreement to clinically assessed arch types than footprint measurements. The different results may probably be explained with the different measurement procedures or different populations. In Chuckpaiwong et al. [11], anthropometrical arch height measurements were conducted in 90% weight-bearing conditions in which the participants had to stand on one leg. In the present study, the arch height index was measured in 50% and 10% weight-bearing situations. Additionally, the footprint measurements in this study were measured with a two-step protocol originally demonstrated by Oladeji et al. [27], while Chuckpaiwong et al. [11] obtained static footprint measurements from the same 90% weight-bearing condition. However, the comparability between both studies may also be influenced by differences in the tested population. Our study included children between 5 and 13 years of age whereas Chuckpaiwong et al. [11] had no age limitation for their participants. The low agreement among raters and between objective and subjective assessments could therefore be also a result of difficulties to rate feet in the phase of growth and development.

Limitations

The static measurement platform was self-built, based on the arch height index measurement system by Butler et al. [34], which is used for a range of studies to measure static dimensions of the foot. The device was not validated but is thought to be equivalent to the system. Additionally, the test protocol may have caused a systematic bias. Static measurements were acquired only once for both left and right sides, because of time limitations.

Another important point to consider is that the children who participated in this study were recruited from a sports camp. Therefore, their physical constitution may differ from the general population.

One big issue, which might influence the ROC curves, is that there is no gold standard for the assessment of the medial longitudinal arch. While examining the relationship between both procedures, there is no possibility to determine the true classification of the feet and assign which procedure is correct [35].

The biggest limitation was the question of whether the clinical rating via photography is comparable to the assessment in a real clinical setting. The raters could only use the photographs taken in static conditions to assess the foot arch and could not see the participant in vivo. In a real clinical examination, the expert can see the whole patient and the rater can inspect the foot condition dynamically and take other clinical features, like leg axis, muscle work, and direct clinical examination, into consideration for his or her clinical judgment. However, due to the enormous time and effort consuming task for raters and participants to meet with each other, a clinical assessment via high-quality, multiangle photography seemed like a more feasible approach for this study. Further research should therefore investigate the agreement between foot arch rating via photography and clinical rating in vivo.

Conclusion

This study showed moderate agreement between raters in the clinical assessment of the medial longitudinal arch in children. Also, only a poor relationship between the subjective clinical foot assessments and the objective static and dynamic foot measurements was revealed. Currently, there is no generally accepted method for foot arch assessment. With this being the first study to examine the rater agreement of clinical foot arch assessment and its relationship to objective measurements in children, further research in necessary to develop an optimal system for the characterization of the medial longitudinal arch in a young population. Therefore, a reproducible and valid characterization of children’s medial longitudinal arches is needed and a consensus on the assessment especially in children has to be established.