Background

Clubfoot, or congenital talipes equinovarus, is a condition that is present at birth in which the foot is in a rigid turned-in position. Corrective treatment of a high quality remains a key requirement for reducing disability and improving function related to the deformity. Over the past decades there has been an increase in the use of the Ponseti method to correct clubfoot [1]. This method involves the simultaneous correction of three components of the clubfoot deformity through manipulation and serial casting. The equinus (downward pointing of the foot) is corrected last, often with a percutaneous achilles tenotomy. This is followed by long term use of a foot abduction brace at night to maintain the foot position [2]. Despite the global trend toward increased use of the Ponseti method, there remains variation in how success of clubfoot treatment is measured [3, 4].

The Ponseti method is administered by locally trained therapists in resource constrained settings in Africa [5]. These clubfoot therapists often work alone and have no specialised physiotherapy or surgical support present in the clinics or nearby. It is important that they have a user friendly assessment system with agreed criteria for when treatment is not working and referral to a specialist for further management is indicated.

No globally accepted outcome scoring system exists to inform locally trained clubfoot therapists of the need for referral for further intervention. The most frequently used approach to measuring whether the Ponseti method has been successful (or not) is clinical assessment. In sub-Saharan Africa 68 to 98% of cases are reported to have a successful outcome with the Ponseti method [4]. This study aims to compare the results of the Ponseti method of clubfoot management at three to five years from initial correction using five different outcome measures. We explore the diagnostic accuracy of the outcome measures, which is the ability of the assessments to discriminate between the need for referral for further intervention and a successful outcome [6]. For methodology review, outcome score results in this study are compared with a reference standard of ‘true’ treatment success status (defined by full clinical assessment). The results are categorised as true positive, false positive (referred but not needed), true negative, and false negative (should have been referred but was missed) [7]. Sensitivity of the scoring system relates to the proportion of the children who need referral for further intervention and who are correctly classified by the outcome measure as requiring referral. Specificity is the proportion of children who do not need referral and who are correctly classified as not requiring referral by the outcome measure. Positive predictive value and negative predictive value are useful to understand the probability that a child with a given positive or negative outcome score result has the need for referral for further intervention and are therefore correctly classified.

Methods

Study design and population

This study was conducted and reported according to established STARD (Standards for Reporting of Diagnostic Accuracy Studies) guidelines [8] (Additional file 1). A cohort study of 218 children with idiopathic clubfoot was conducted in 2016. The children were managed with manipulation and casting at Parirenyatwa Hospital, Harare and the results are published elsewhere [9]. All children with a diagnosis of unilateral or bilateral idiopathic clubfoot who started treatment with the Ponseti method at the study hospital between 22nd March 2011 and 23rd April 2013 (25 months) were included in the cohort. The only exclusion criterion was foot conditions other than idiopathic clubfoot, for example clubfoot associated with neural-tube defects such as spina-bifida.

Sampling technique

The phone numbers of all carers of the cohort children were extracted from the clinic records in January 2017 and contact with them was attempted at least three times. Caregivers and their children were invited to attend the study. The children were between 3.5 and 5 years from initial casting.

Ethics, consent and permissions

Ethical approval for this study was granted by the Medical Research Council of Zimbabwe (MRCZ) and the London School of Hygiene & Tropical Medicine (LSHTM) (ref:11132 /RR/4725). All children and their caregivers were read an information sheet about the study and given an opportunity to ask questions. If they agreed to participate, written consent was taken from the caregiver who remained present throughout the assessment as per national requirements. Transport costs were reimbursed and referral services available in Harare were mapped pre-emptively to ensure appropriate onward referral for any children that required further intervention.

Data collection

Two physiotherapists who are experienced in co-ordinating national clubfoot programmes reviewed the assessment tools over three days for contextual relevance. The questionnaires were available in English and Shona and were cognitively tested. We used five outcome methods, three that give a score, and two that give a binary (success/failure) outcome. The Roye score [10] is a self-reported measurement that is used in high income settings. The Bangla clubfoot assessment tool [11] and the Assessing Clubfoot Treatment (ACT) score [12] combine physical assessment and parent reported outcome measures, and have been developed for low resource settings. The Bangla score includes a functional assessment. The two binary outcomes were assessment of a plantigrade foot [5] and the relapse pattern [13]. The study protocol was pilot tested for suitability in July 2016. Children were examined independently in January 2017 by the two physiotherapists and a decision was made if referral for further intervention (re-casting or surgical review) was required. Clinical examination composed observation, physical assessment and functional review; it included assessment of passive and active range of motion (plantiflexion, dorsiflexion, eversion, inversion of the foot, and knee extension), muscle strength tests of the calf and evertors of the foot, heel raises, squatting ability and gait analysis (walking and running).

Data management and analysis strategy

The data were entered into a Microsoft Excel 2000 (Microsoft Inc., Redmond, Washington) software package. Data were analysed using Stata 14.1 (Stata-Corp 4905, Lakeway Drive College Station, Texas 77, 845, USA). Statistical significance was set at the 95% confidence level. The inter-observer variation for the measurement of the physical assessment tools was assessed i.e. Intra-class correlation coefficient (ICC) ≥0.75 [10]. Outcomes of children who had completed casting and ≥ two years of bracing were compared to all of the children who were followed up, and to those who had only completed casting. A two-tailed paired t-test was used to assess the mean difference between the outcome measures of Roye, Bangla and ACT scores. Fisher’s exact test of independence was used to assess the difference in proportion of children with an outcome of relapse and plantigrade foot. The five measures were compared against the standard of whether referral for further intervention was required (for re-casting or surgical review) as defined by a consensus agreement of two expert physiotherapists with experience of managing clubfoot in countries in Africa. Sensitivity, specificity, positive and negative predictive values were calculated for the five measures and compared to full clinical assessment (gold standard). The threshold for diagnostic accuracy was based on previous studies and was defined prior to the study. It was set at 70% for the three scores with continuous scales [14] and positive/negative for the binary outcomes [7].

Results

31% (68/218) of the cohort attended for review and were assessed. 50 (73%) children were boys and 18 (27%) were girls. There were 35 (51%) bilateral and 33 (49%) unilateral clubfeet. Tenotomies had been performed in 52 (76%) cases and the average number of casts to correction was 6.9 (5.9–8.0 casts). The average length of time attending appointments from initial review was 30 months (26 – 35 months). Of the children followed up, 24 (35%) attended clinic reviews for 4–5 years (Fig. 1).

Fig. 1
figure 1

Length of time child attended clubfoot clinic appointments

All tools demonstrated good reliability, with an intra-class coefficient (ICC) of ≥0.82 on all criteria (Table 1). An ICC of 1.00 demonstrates perfect correlation.

Table 1 Inter-observer variation for outcome measures
Table 2 Results of cohort of children followed up (n = 68)

In the children who were followed up (n = 68) the success of treatment with different scores varied between 56 and 89% (Table 2). In the children who completed casting (n = 63) it was between 57 and 93%; and in the children who completed casting and at least two years of bracing (n = 38) it was from 58 to 97% (Table 3). The individual category calculations for each outcome measurement are in Additional files 2, 3, 4 and 5.

Table 3 Results of cohort of children followed up who completed > 2 years bracing (n = 38)

The proportion of children with relapse and the Bangla tool had the lowest good outcome results of 56 and 59% respectively. Figure 2 demonstrates the variation in outcome when compared to full clinical assessment (the gold standard illustrated in the first row of the figure). 87% (33/38) children who completed ≥2 years bracing were assessed as successfully treated with full clinical assessment. The scores that demonstrate a higher success (Plantigrade: 97% and Roye score: 94%) miss cases that require further intervention. The scores that demonstrate a lower success (Relapse: 58% and Bangla: 66%) are restrictive in the measurement of success.

Fig. 2
figure 2

Comparison of outomes to measure success against full clinical assessment

There was strong evidence for a difference between the outcomes of the Roye score and the Bangla score (p < 0.0001), the Roye and the ACT score (p = 0.0013), and the ACT and Bangla score (p < 0.0001). It follows that none of these assessments can provide essentially the same estimate of success as the other measures.

There was a difference in the relative proportion of the cohort with relapse and plantigrade foot when assessed with Fischer’s exact test (p = 0.012). The binary outcomes are therefore not interchangeable.

No adverse events occurred as a result of any of the outcome measures undertaken. When compared to the standard of full clinical assessment and the subsequent decision on the need for referral for further intervention, the Roye score had a sensitivity of 31.8% (95%CI: 13.9–54.9%) and a specificity of 100% (95%CI: 92–100%), with positive and negative predictive values of 100 and 74.6% respectively. The Bangla score demonstrated 79.2% (95%CI: 57.8–92.9%) sensitivity and 79.5% (95%CI: 64.7–90.2%) specificity with 67.9% positive predictive and 87.5% negative predictive values, and the ACT score had 79.2% (95%CI: 57.8–92.9%) sensitivity and 100% (95%CI: 92–100%) specificity in predicting the need for referral, with positive and negative predictive values of 100 and 89.8% respectively. Of the 44 children that did not require referral for further intervention, all achieved plantigrade or more (positive predictive value: 100%) and of those who did require referral (n = 24), 14 were identified with the plantigrade assessment (achieved less than plantigrade). The relapse score was most restrictive in identifying good outcome. False positive and false negative scores are displayed in Table 4.

Table 4 A comparison of measurement methods with the need for referral for further intervention

Discussion

This study found that five scoring systems that are used to report outcomes of clubfoot treatment provided a wide spectrum of success (from 56 to 89% of cases) in a cohort with 3.5–5 years of follow up. When compared with the standard of clinical assessment, missed referrals ranged from 7.4% (the Bangla and ACT scores) to 22.7% (the Roye score). The measurements assess different aspects of clubfoot correction, from parent reported outcome measures (the Roye score) to scores that include physical assessment (the Bangla and ACT score) and single measurements (plantigrade foot and evidence of recurrence). Success improves in all measures with the completion of casting and at least two years of bracing.

Comparison to previous studies

There are limited studies that compare measurement tools in the same patient against which to compare our findings. However, success of treatment in this cohort is similar to other studies in sub-Saharan Africa (between 63 and 98% of cases) [9]. Non-adherence and surgical intervention, often defined as failure, are reported to vary from 7 to 61% and 3–39.4% [15] respectively. Ponseti and Laaveg [16] describe a scoring system that rates functional results as satisfactory in 88.5% of feet. Further studies describe success using the Ponseti and Laaveg system as 89.3% [17]. The criteria includes the need for a goniometer and the tool was therefore not included in evaluation of this cohort.

Use of outcome measures

The ease of use and rate of incorrect classification in the tools used to measure success need to be considered when selecting an outcome measure. Single item scales for assessment of individual children require no further calculation and may be easier to use in clinics (such as plantigrade foot or evidence of relapse), however their simplicity may not allow a full assessment of success. Multi-scale items prove difficult to transform into useful statistics without technology and are unlikely to be routinely used in clinics. This study found no clear agreement between the different outcome measurements in use.

All of the assessments used in this study have limitations. The Roye score has been validated in high income settings and parents in our study reported difficulty in answering the question of “How often does your child have problems finding shoes that he or she likes?” as it was understood to be related to the availability of a variety of shoes. The Bangla score took the longest time to transform with statistical analysis. Acceptability and feasibility of the ACT score is needed to be studied in future research. The ACT score is likely easy to teach, however this is unknown as the examiners were physiotherapists; the time taken for other cadres of health workers to use the ACT tool is also unknown. With regard to the relapse score, Bhaskar et al. (2013) considered ankle dorsiflexion < 15 degrees with knee in extension as grade IA relapse. This may be a reason for the restriction in defining good outcome as an evaluation of 85 normal feet in children found that the mean ankle dorsiflexion was 12.8 degrees with knees in extension [18]. Greater than 15 degrees may therefore be difficult to achieve.

Relationship between the outcome measures and clinical assessment

The Bangla and ACT tool were most helpful in predicting the need for referral for further intervention (specialist opinion or for further manipulation and casting). The five referrals that were missed with the ACT score were children who required review of a mobile curvature of the lateral border of the foot or supination in swing phase, neither of which are assessed with the score. Despite this, the ACT tool demonstrates the best diagnostic accuracy for the need for referral for further intervention.

Strengths and limitations of study

This study reports on five measurements of success in a cohort at 3.5–5 years from initial treatment. Repeat phone calls facilitated assessments when caregivers were initially unavailable. Two independent raters reduced the likelihood of reporting bias and all outcome measures were verified by the reference standard. The threshold for diagnostic accuracy was based on previous studies and was defined prior to the study. There were also study limitations. No distinction between a clubfoot that may not have been fully corrected and a relapsed clubfoot was made, and all cases with elements of the deformity were classified with the relapse score, which may be a source of potential bias that underestimates the accuracy of the relapse score. The tools were chosen based on ease of use in low resource high volume clinics and were not all initially developed to identify need for referral for further intervention.

Implications for practice

Task shifting and task sharing between orthopaedic and non-specialised health workers in some clinics means that outcome measures are even more important as teams expand. As older children are being treated with the principles of the Ponseti method [19], expert guidance on assessment and measurement in these cases is needed. The Roye score is overly optimistic of good outcomes, the Bangla score is restrictive in identifying good outcome, and the ACT score most closely aligns to clinical examination. However, the Bangla, relapse and ACT scores closely agree on false negatives and have the least chance of missing recurrence; the Bangla score and the relapse score over-estimate referral needs compared to the ACT score.

Conclusions

In this small comparative study, missed referrals ranged from 7.4% (the Bangla and ACT scores) to 22.7% (the Roye score) when compared with the standard of clinical assessment. Ease of use and the cost of false positives need to be considered in the selection of a tool. All scores demonstrated good reliability. The Roye score will miss cases and the Bangla and the Relapse tools are restrictive in assessment of successful outcome. We found no clear agreement between the different scores in use. When compared to the normal practice of full clinical assessment, the measurement tool with the best evidence for diagnostic accuracy was the ACT tool.