Introduction

Foot and ankle pathologies are often secondary to traumatic and non-traumatic problems, and include but not limited to: plantar fasciopathy, hallux valgus, metatarsalgia, hammer toe, ankle sprain, osteoarthritis and rheumatoid arthritic changes [1, 2]. The prevalence of foot problems ranges from 6 to 30% in general population [3, 4]. The situation becomes more apparent in the aging population. Persistent painful foot conditions occur in about 24% of older adults [5]. Individuals aged 65 and above are generally considered as older adults [6]. Such conditions have been treated by different approaches. Patient-reported outcome measures (PROMs) are very helpful tools in assessing the efficacy of the response to these treatments [7]. Currently, there are several PROMs used in foot and ankle research.

The Foot Function Index (FFI) is a self-reporting outcome measure developed by Budiman-Mak et al. [8]. In a systematic review by Rosenbaum et al., the FFI was identified as the most widely used foot complaint-related evaluation tool [9]. This instrument is found to be feasible, easily calculated, easily understood by the participants and takes less than 10 min to complete [10, 11]. The FFI has been applied to more than 5000 participants worldwide, with 20 different foot and ankle pathologies [11]. The use of FFI is wide and has been employed in studies with orthotic intervention, physical therapy and surgery for various problems [12,13,14,15]. It has shown excellent validity, reliability and responsiveness in previous studies [11]. It is also compatible with other outcome measures like SF-36 for analyzing foot and ankle problems [16]. Although the FFI has been translated to numerous languages [10, 16,17,18,19,20,21,22,23,24,25] yet no Arabic version has been published. An Arabic version for the general population of Saudi Arabia and Arabic-speaking countries would make it more convenient by providing a resourceful OM tool for evaluating and managing foot and ankle disorders.

Hence, the present study was aimed to translate the FFI into Arabic and adapt it to the local culture. Additionally, the psychometric properties of the Foot Function Index—Arabic (FFI-Ar) version were tested, including the internal consistency, test–retest reliability and construct validity.

Materials and methods

Study design

This study was conducted in two phases: i) The translation and cross-cultural adaptation of the original FFI into Arabic (FFI-Ar); ii) the evaluation of the validity and reliability of the FFI-Ar using a cross-sectional study design on a sample of patients having foot and ankle disorders. The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) guidelines were used to perform the adaptation process [26]. This study obtained ethical approval from the University Ethics Committee of University of Strathclyde, Glasgow, UK, and locally from the Institutional Review Board of the College of Medicine, King Saud University, Riyadh, KSA (Research Project No. E-21-5703).

Phase 1: translation and cross-cultural adaptation of FFI

The FFI was translated from English into Arabic and culturally adapted using the ISPOR guidelines, which were approved by regulatory agencies such as the Food and Drug Administration and the European Medicines Agency. Translation and cultural adaptation of the FFI into Arabic was not merely a translation process, but one that took into account the cultural, idiomatic, linguistic and contextual aspects related to the translation. In the current study the recent trend of translation and cultural adaption was adapted which has been followed in several other studies [27,28,29].

In the first step (the forward translation), two independent Arabic translations of the original FFI were produced, by two bilingual translators. They were native Arabic-speaking health professionals, having experience in validation studies. In the following step, both the translated versions were merged together to produce a single reconciled version of FFI-Ar. In the third step, which is the investigation of the standard of the translated version, the FFI-Arabic was moved back into the source language. An independent backward translation of the reconciled FFI Arabic into English was produced by a backward translator. The translator was kept blinded of the original FFI. The next step which was the back translation review, the back-translated FFI and the original versions of FFI were compared by a native English-speaking investigator to investigate any short comings.

After translation and back translation were completed, the Arabic version of the FFI was formatted into the approved layout, which matched the original FFI. The FFI-Ar was applied to 15 patients for pilot testing or cognitive debriefing.

All participants were provided with the pre-final Arabic version of the questionnaire. The mean time to complete the questionnaire was 5.03 min (SD ± 0.76). Most participants did not experience any difficulty completing the questionnaire. However, one participant suggested elaborating the questions in items 13 and 14 by adding the word “during”. Two participants asked for an explanation of the term “assistive devices” in items 22 and 23, and stated that “assistive devices” should be better explained. All comments and suggestions stated by piloting patients were critically reviewed by the project team, and decisions about necessary revisions were made. The final translated version was prepared carefully before being sent for psychometric evaluation.

Phase 2: testing the psychometric properties of the FFI-Ar

The internal consistency, test–retest reliability and construct validity of the final version of the FFI-Ar were assessed among patients experiencing foot and/or ankle problems. Construct validity was examined with reference to the SF-36. All patients were living in Riyadh city of Saudi Arabia. Male and female participants between 18 and 70 years, primarily diagnosed with foot and/or ankle disorders and referred to medical rehabilitation department at King Saud University Medical City were invited to participate in this study. The final FFI-Ar was given to 50 native Arabic participants who can read and speak Arabic. Those individuals who were unable to read and speak Arabic and having comorbidities and cognitive problems were excluded from the study. Demographic data concerning each participant were collected using a separate form, while their level of pain was measured using the Numeric Rating Scale (NRS). Each participant completed both the FFI-Ar and the previously validated Arabic version of the SF-36. After a week, thirty percent of the participants with stable symptoms completed the FFI-Ar in order to assess the test–retest reliability of the questionnaire.

Data Collection

A signed consent form was obtained from each participant before administering the questionnaires. At baseline, all questionnaires were completed by each participant, which included the demographic data sheet, the FFI-Ar and the SF-36. The average time to complete all questionnaires was 10 to 15 min. For test–retest reliability, a sample (n = 16) completed the FFI-Ar for a second time. At the second visit, each participant of the subgroup completed the Global Rating of Change Scale (GRC) before the treatment session to identify participants with stable symptoms who would be involved in retest reliability assessment. Following this, a copy of the FFI-Ar was administered. The second assessment was taken after one to two weeks in order to avoid memorisation and prevent any bias.

Instruments

The FFI is a self-reporting, region-specific outcome measure. It covers several dimensions of foot function and has been used in relation to several pathologies of the foot and ankle. The FFI-Ar contains 23 items categorized into three subscales: pain, disability and activity limitation. The pain and disability subscales contain 9 items each, while the activity limitation subscale contains 5 items. Previous studies showed that the FFI has high validity, reliability and responsiveness. All participants completed the FFI-Ar and all those who were invited for retesting completed it a second time.

The NRS is a simple, unidimensional measure of pain intensity in adults that is comprised of an 11-point scale [30]. The Short Form 36 Health Survey (SF-36) is a multi-function, general health outcome measure with 36 questions. It produces an 8-scale profile of scores as well as two summary scores (physical and mental measures). The Arabic version has been found to be reliable and equivalent to the original version (English) [31]. The SF-36 – Arabic consists of 36 items that assess the physical and mental components of health according to 8 subscales: Physical functioning (PF), Role limitation due to physical health problems (RP), Bodily pain (BP), General health perceptions (GH), Vitality, energy and fatigue (VT), Social functioning (SF) and Role limitation due to emotional problems (RE).

Statistical analysis

Data analysis was completed using Statistical Package for the Social Sciences (SPSS) software (IBM SPSS Statistics for Windows, Version 23.0; IBM Corp., Armonk, NY, USA). Descriptive statistics (frequencies (%), means ± standard deviations) for the basic features of the participants’ demographic data were analysed. The Cronbach’s alpha was used to examine the internal consistency of the measurement. The test–retest reliability of the FFI-Ar was assessed using intraclass correlation coefficient (ICC) with a 95% confidence interval (95% CI). The Pearson’s correlation analysis was used to evaluate the construct validity and determine the association between the FFI-Ar and the SF-36 Arabic version and NRS. The p-value of significance was set at p ≤ 0.05. In this study, it was hypothesized that the FFI-Ar would reveal moderate to higher correlation with the NRS and physical-related SF-36 domains, i.e. convergent validity, while it would reveal weaker correlation with the mental-related domains of SF-36, i.e. discriminant validity.

Results

Demographic characteristics and descriptive statistics

The demographic and clinical data of the 50 participants are presented in Table 1.

Table 1 Demographic and clinical data (n = 50)

FFI-Ar score characteristics

The scores of the FFI-Ar were normally distributed. Descriptive statistics of the FFI-Ar subscales and total scores are listed in Table 2.

Table 2 FFI-Ar subscales and total score descriptive statistics (n = 50)

Floor and Ceiling effects

No floor or ceiling effects were found in the subscales and the total scores. Table 3 shows the floor and ceiling effects for all the subscales and total score.

Table 3 Floor and ceiling scores of the FFI-Ar subscales and total score

Internal consistency

Of the three subscales, internal consistency for the disability subscale was the greatest (Cronbach’s coefficient alpha (α) value 0.94). The Cronbach’s coefficient values were also good for the pain and activity limitation subscales (alpha value of 0.88 and 0.85, respectively). The internal consistency for the FFI-Ar total score was fair (alpha value 0.76). The corrected item-to-total correlations ranged from 0.48 to 0.82. Item 7 (Pain walking with orthotics) showed lowest correlation, while item 5 (Pain walking with shoes) showed the highest correlation. Deleting an item from the FFI-Ar did not significantly change the alpha level; the values ranged from 0.80 to 0.92 when an item was deleted at baseline (Table 4).

Table 4 Statistics of the FFI-Ar items

Test–retest reliability

A test–retest reliability analysis was undertaken on 16 stable participants. Stable participants were those whose conditions had neither improved nor deteriorated significantly between the test and retest interval. Test–retest reliability was tested with the intraclass correlation coefficient (ICC2,1) two-way random-effect model, for participants who completed the FFI-Ar questionnaire two times. The ICC value for pain, disability and activity limitation subscales was 0.81, 0.93 and 0.80, respectively, indicating good to excellent agreement of test–retest reliability. The ICC level for the FFI-Ar total score was also good, with a value of 0.89.

The test–retest reliability of FFI-Ar statistics is mentioned in Table 5. Bland–Altman plots show slight relevant differences from test to retest in the activity limitation subscale, while no significant difference was seen in pain, disability and the FFI-Ar total score (Fig. 1).

Table 5 Test–retest reliability of the FFI-Ar
Fig. 1
figure 1

Bland–Altman plot for individual subscales and the total score of FFI-Ar

Construct Validity

The Pearson’s correlation coefficient (rho) between FFI-Ar and SF-36 resulted in negative values. This explained by the fact that while a higher FFI score indicates worse health status, a higher SF-36 score indicates better health status. Of the physical component of the SF-36, PF, RP and BP had a moderate correlation with all subscales of FFI, while GH showed a weak correlation (Table 6). Alternatively, all four mental component domains of SF-36 (VT, SF, RE and MH) showed weak correlations with the subscales of FFI. The physical component summary (PCS) had a moderate correlation of -0.58 with the FFI total score, while the correlation with the mental component summary (MCS) was negligible. The strongest correlation was between the PCS of SF-36 and disability subscale of FFI (-0.60). Pearson’s correlation coefficients between FFI and SF-36 are described in Table 6.

Table 6 Pearson’s correlation coefficient of the FFI subscales with the SF-36 domains

Construct validity of FFI-Ar was also confirmed via its strong Pearson’s correlation with NRS pain. The strongest correlation was between the pain subscale of the FFI-Ar and the NRS (rho value of 0.72), while the activity limitation subscale of the FFI-Ar showed a weak correlation. The FFI-Ar total score correlation with the NRS was also good (rho value of 0.57, Table 6). The correlation coefficient between NRS and FFI-Ar resulted in positive values, as higher scores on both scales mean worse health status.

The analysis of correlation upheld the hypothesis of convergent and discriminant validity for FFI-Ar. It was hypothesized that the FFI-Ar would result in a high correlation with the physical domains of SF-36 and NRS and would result in a weak correlation with the mental domains of SF-36. The strong Pearson’s correlation of FFI-Ar with the NRS, physical domains and PCS of SF-36 confirmed its convergent validity. Its weak correlation with the mental domains and MCS of SF-36 proved its discriminant validity.

Discussion

This study aimed to translate the FFI into Arabic and test the psychometric properties. This study suggests that the FFI-Ar is a reliable and valid tool for assessing foot and ankle problems in Arabic populations.

Cultural adaptation of a questionnaire is very important, as it may lead to systematic errors if not addressed [27]. All discrepancies and issues encountered during the process were cross-checked, proofread and carefully solved. The only cultural adaption necessary occurred during the first phase and involved the term “four blocks” in the twelfth item of the questionnaire. Similar adaption changes were also made in the French and Persian versions [22, 24]. The word “block” is informally used as a unit of distance in North America, which is unfamiliar in Saudi Arabia. It was deduced that one block on average is equal to 150 m. Thus, the term “four blocks” was translated as “600 m” in the Arabic version. The majority of the recruited participants found that the adapted version of the FFI-Ar was simple, clear and understandable with very few issues.

For this study, participants with both acute and chronic painful ankle/foot conditions were enrolled to ensure a wide variety of patient problems were represented. Enrolling multiple foot conditions such as plantar fasciopathy, metatarsalgia, ankle sprain, hallux valgus, hallux rigidus, cavus foot and painful flat feet was also adopted in the Brazilian Portuguese [21], Italian [20] and Chinese [21, 23] validation studies of FFI. Statistical analyses revealed that the Arabic version of FFI was reliable and valid, and could be used with Arabic populations. The mean completion time for the FFI-Ar was 5.7 min, which suggests that the questionnaire is feasible to administer within a clinical setting. Comparably, Budiman et al. reported that the completion time for the original FFI was 5–10 min, while the French version took up to 10 min [24].

Good internal consistency of FFI-Ar was found for both the total score and the three subscales, indicating that all items contribute to measuring the same construct. Among all subscales, the disability subscale was the most internally consistent with the Cronbach’s alpha value of 0.94, followed by the pain (α = 0.88) and activity limitations (α = 0.85) subscales. The internal consistency results revealed that the Cronbach’s alpha values of FFI-Ar were comparable to those obtained in other FFI translations, such as the Turkish version (Cronbach’s α 0.82–0.94) [25] and French version (Cronbach’s α 0.85–0.97) [24]. Unlike other validation studies in which the internal consistency of the activity limitation subscale was much lower [18, 19], the result of the present study was good (alpha value of 0.85), which indicates that this subscale of FFI-Ar is measuring the same construct. As such, the subscale was not removed from the index.

The test–retest reliability of FFI revealed good to excellent results for all subscales and the total score. The test–retest reliability in the disability subscale was excellent (ICC = 0.93). This may be the case because disability is usually a stable symptom that does not fluctuate as commonly as with pain. The ICC values for pain, activity limitation and total score were 0.81, 0.80 and 0.89, respectively. The original FFI version showed ICC values of 0.69, 0.84 and 0.81 for pain, disability and activity limitation, respectively [8]. The test–retest reliability of the FFI-Ar was also comparable to the other versions of FFI [19, 24, 32].

The SF-36 is considered a gold standard for measuring criterion validity of PROMs. This is evident in several validation studies of FFI, where the SF-36 was employed to test their validity [7, 16]. The construct validity of FFI-Ar was confirmed by its moderate to high correlation with SF-36. In the current study, as also observed in the Turkish [25] and German [16] validation studies, the physical domains of SF-36 showed high correlations with the FFI. In this study, Pearson’s correlation coefficient of FFI-Ar total score with the PCS of SF-36 was − 0.578, while its correlation with the MCS was − 0.282. Similarly, in the Turkish version, the FFI total score correlation coefficient with the PCS and MCS of SF-36 was − 0.278 and − 0.127, respectively [25].

The NRS also showed significant correlations with the FFI. The highest correlation was seen between the pain subscale of FFI and NRS (Pearson’s correlation coefficient 0.721). The reason for this high correlation may be that that both scales directly measure pain.

From the correlation analysis, the predefined hypothesis of convergent and discriminant validity was also confirmed. The high correlation coefficients of FFI-Ar with the NRS and PCS of SF-36 prove its convergent validity. The weak correlation coefficients between FFI-Ar, the mental health domains and the MCS of SF-36 prove its discriminant validity.

The rationale for the weak correlation coefficient with the mental health domains of SF-36 is that the FFI does not have mental and psychosocial subscales. Similarly, the three existing subscales have no such items related to mental health or social interaction. Due to a lack of related items, the FFI was criticised by some researchers. For this reason, it was revised in 2006 to produce (FFI-R) by adding more items and a subscale about quality of life and psychosocial activities [7]. However, most of the translation and validation studies were performed on the original version of FFI, which was developed in 1991.

Initially, the FFI was developed merely for rheumatoid arthritis of foot and ankle problems. In more recent years, its use has expanded. It is now utilised for multiple foot problems, including post-surgical and post-fractured cases. It has been used in 19 studies related to the efficacy of orthotic management and in 31 studies related to the outcomes of surgical corrections [1125]. Furthermore, it has been employed in numerous cross-sectional and observational studies.

Some limitations are highlighted in the current study; for example, the sample participants were young and the recruitment was done at a single tertiary care hospital.

Conclusions

The FFI-Ar proved to be a feasible, valid and reliable outcome measure for assessing both traumatic and non-traumatic foot and ankle disorders, for Arabic-speaking patients.