Background

Patient-reported outcome measures (PROMs) reflect patients’ perspectives on their health status, functioning and quality of life (QoL) [1] and are also useful for informing clinical and healthcare decision-making [2]. Since April 2009, the National Health Service (NHS) in England requires patients undergoing surgery to provide PROMs data before and after treatment. The current PROMs programme covers patients undergoing varicose vein, groin hernia, knee replacement and hip replacement surgery [3]. Presently, PROMs are not routinely collected for patients with peripheral arterial disease (PAD), a condition associated with substantial disability, morbidity and mortality [4]. PAD is caused by widespread atherosclerosis of the lower limbs and may be asymptomatic in the early stages. An initial common presentation of PAD is atypical leg pain. Pain may occur in a specific group of muscles in the lower limb during effort (this is referred to as intermittent claudication). Severe stages of PAD present as rest pain in the legs, leg ulcers or gangrene—collectively known as critical limb ischaemia (CLI). The mainstay of treatment is to improve symptoms, delay disease progression, prevent tissue loss and modify risk factors [4, 5].

Validation studies provide valuable evidence for selecting appropriate PROMs for use in clinical and research settings. In this review, the term validation study refers to a study reporting the evaluation of one or more measurement properties of a PROM—including its validity (the degree to which the instrument measures what it is supposed to measure); reliability (the degree to which measures are reproducible and consistent over time in patients with a stable condition); responsiveness (the degree to which the instrument detects meaningful change over time) and acceptability (the degree to which the instrument is acceptable to the patient). A suitable PROM must demonstrate its validity, reliability, responsiveness and appropriateness in a relevant patient population [6]. Confirmation of these psychometric properties must be obtained from sources (i.e. context of study, patient factors and study characteristics) similar to those in which the PROMs will be applied [6].

A better understanding of the psychometric properties of PROMs obtained from English-speaking patients with PAD will help to select an appropriate tool for patients managed within the NHS. Therefore, this study sought to (1) identify English language publications reporting the psychometric evaluation of PROMs in patients with PAD, (2) critically appraise eligible studies, and (3) examine the psychometric properties of identified PROMs to inform the development of a valid and reliable instrument to incorporate into an electronic personal assessment questionnaire (ePAQ) as part of a project to inform the reconfiguration of vascular services in the UK..

Methods

A systematic review of peer-reviewed English language articles was undertaken according to recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) group [7], the Oxford system and the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) group [8, 9] with the aim to identify validation studies in a well-defined population of English-speaking patients with symptomatic PAD. The study’s protocol is available on request from the authors.

Literature searches

Comprehensive searches using a two-staged approach were conducted in Medline and Medline in Process, EMBASE, the Cochrane Library, CINAHL, PsycINFO and Web of Science from date of inception up to August 2013 (Search 1) and up to February 2014 (Search 2). Updated searches were conducted in Medline and Medline in Process in January 2015. Search 1 sought to identify studies reporting PROMs in patients with PAD while Search 2 aimed to identify studies reporting the development and/or validation of relevant PROMs. Relevant PROM terms were identified from scoping searches, discussions with experts and previous research relating to relevant outcome measures. Search terms in the search 1 strategy included free text terms and Medical Subject Heading (MeSH) terms related to: (1) PAD; (2) known generic PROMs and (3) known condition-specific PROMs. Additional PROMs were identified following examination of titles and abstracts of records retrieved from Search 1. All potentially relevant articles were also coded at this stage. The search 2 strategy comprised of all terms used in the search 1, together with (1) additional PROM terms identified from sifting retrieved records and (2) a methodological search filter for locating studies reporting measurement properties. Search strategies were adapted for searching within different databases. Search strategies used in Medline are available as Additional file 1.

Further searches were conducted in the PROMs Bibliography (Oxford University) and the Patient-Reported Outcome and Quality of Life Instruments database (PROQOLID) [10]. References of identified systematic reviews and included studies were examined for potentially eligible studies. All retrieved records were transferred and managed within a single reference management database.

Study selection

Study selection was undertaken by one reviewer from a pool of 4 reviewers (EP, ME, PP, RD) and checked by a second reviewer. Eligibility criteria are summarised in Table 1. Disagreements were resolved by discussion and referred to a third reviewer, when needed. After excluding duplicates and records which did not appear to be relevant by examination of titles and abstracts, all full-text articles of potentially relevant articles were obtained for detailed review.

Table 1 Criteria for considering eligibility of studies for inclusion in the review

Studies including English-speaking patients with a diagnosis of PAD were included in the review. Proficiency in English was indicated or assumed if studies were conducted in countries where English is an official language and/or reported that 80% or more of participants were English speakers. Studies published in English but reporting outcomes obtained from translated instruments, i.e. non-English translations of relevant PROM instruments or English versions of non-English PROMs were excluded. This was considered as an acceptable approach to overcome the uncertainty due to language validation and cross-cultural adaptation of PROMs [11].

Data extraction

Data extraction was completed by one author (either EP, ME, PP, RD or AK) and checked by another author. All disagreements were discussed and resolved by consensus. Data were abstracted into a piloted standardised form and comprised patient characteristics, study characteristics, names, domains, items and reported psychometric evaluations of identified PROMs.

Quality assessment

The methodological quality of studies was assessed using the COSMIN checklist [12]. This checklist comprises of 114 items organised as 12 boxes related to the following measurement properties: validity (including structural validity, content validity, criterion validity and cross-cultural validity), internal consistency, reliability, measurement error, responsiveness and hypothesis-testing. A 4-point rating scale (excellent, good, fair or poor) was applied with the overall methodological quality scores presented using a “worst score counts method” per box [13]. The COSMIN checklist also covers interpretability and generalisability which were assessed but not scored.

Due to the lack of consensus on how to appraise PROMs, study-specific criteria were adapted from various sources [2, 8, 1417] as outlined in Table 2 and used for the assessment of psychometric performance of identified PROMs.

Table 2 Appraisal criteria for assessing the psychometric properties of patient reported outcome measures

Data synthesis and analysis

Tabular and narrative syntheses of study characteristics were undertaken. A summary of psychometric criteria was completed based on the Oxford system and the COSMIN group system [8, 9]. The following combined rating scales were allocated: (0) for not reported; (−) for evidence not in favour; (+/−) for conflicting evidence; (?) for questionable methodology and (+) for evidence in favour.

Results

Of the 6893 records retrieved from searches, 14 studies with data for 13 PROMs were found to be eligible to be included in this review as shown in Fig. 1. Twenty-eight full-text articles were excluded because they reported outcomes using ‘non-eligible’ PROMs (i.e. English translations of non-English PROMs and non-English versions of relevant PROMs), included study populations for whom outcomes were not clearly reported or presented no data on psychometric evaluations.

Fig. 1
figure 1

Flow diagram of study selection here

Study characteristics

Table 3 provides a summary of study characteristics. Studies were conducted in Australia [18], UK [1925] and the USA [2631]. All studies were conducted as prospective observational studies. Missing information relating to study setting [22, 30, 31], diagnostic criteria of participants [18, 24, 26, 28, 30, 31] and schedule for assessment of PROMs was noted.

Table 3 Table of characteristics of included studies

Participants’ characteristics

Data were available for 1594 patients presenting with symptomatic PAD. Sample sizes ranged from 26 to 295 patients with more than 50% of included studies reporting study populations of less than 100 participants. Overall, men made up between 54 [22] and 91% [27] of study populations. Diagnostic criteria and management strategies varied across studies. Included studies fell into 2 broad categories based on diagnosis of patients: studies with (i) patients with IC only [18, 20, 22, 24, 27, 29] or (ii) patients with different degrees of severity of PAD [19, 21, 23, 26, 28, 30, 31].

Psychometric data

Data relating to the psychometric evaluation of 6 generic PROMs and 7 condition-specific PROMs in patients with PAD were available. The most frequently assessed generic questionnaires were the SF-36 [1823, 26, 27, 30, 31] and the EQ-5D [19, 20, 22, 26]. The King’s College Hospital Vascular Quality of Life Questionnaire (VascuQoL) [22, 23, 25] and the Walking Impairment Questionnaire (WIQ) [20, 24, 2631] were the commonly reported condition-specific measures. Two studies reported the evaluation of the Claudication Scale (CLAU-S) and the Estimation of Ambulatory Capacity by History Questionnaire (EACH-Q) which were originally developed in France [24] and Germany [22], respectively, alongside relevant PROMs. Information relating to the CLAU-S and EACH-Q was excluded in this review.

Information about the development of the WIQ [29]; Intermittent Claudication Questionnaire (ICQ) [20]; VascuQoL [23]; Peripheral Artery Questionnaire (PAQ) [30] and the PAD Quality of Life Questionnaire (PADQOL) [31] was found in 5 studies. Limited information about the development of the WIQ was noted [29], however for the remaining instruments studies reported methods consistent with recommended standards [11, 32]. Items, domains, response options and scoring of identified PROMs are presented in Table 4.

Table 4 Table of items, domains, response options, scoring and administration of included outcome measures

In relation to the COSMIN checklist, the methodological quality was assessed by totalling the number of boxes that have been scored from poor to excellent. Of 36.8% of the included studies (n = 42 boxes) was rated as poor, 40.3% (n = 46 boxes) as fair; 21.9% (n = 25 boxes) as good and 0.9% (n = 1 boxes) as excellent. Details of quality assessment are presented in Additional file 2: Table S1.

Assessment of psychometric properties

The timing of assessments of the validity of PROMs varied across studies and sometimes, within the same study [29]. Data on responsiveness were reported for the WIQ [29], ICQ [20]; VascuQoL [22, 23]; SF-8 [21]; SF-36 [19, 2123]; EQ-5D [19, 22]; Nottingham Health Profile (NHP) and Sickness Impact Profile-intermittent claudication (SIPic) [22]. Test-retest reliability of PROMs was reported in 8 studies. Follow-up periods varied and ranged from 1-week [19, 20, 27]; 2-week [20, 21, 26] to 1-month intervals [23, 30]. A summary of reported psychometric properties of identified PROMs is presented in Table 5.

Table 5 Summary of the psychometric properties of patient-reported outcome measures in patients with peripheral arterial disease

Generic patient-reported outcome measures

Eleven studies assessed the construct validity of the SF-36. Five studies [2023, 30] reported good evidence with the remaining presenting mixed evidence. Evidence for the internal consistency of the SF-36 was negative from one study [18] and positive in another study [30]. Only one study [25] reported positive evidence on responsiveness while the six studies [1923, 30] found mixed evidence. Test-retest reliability was assessed in 4 studies with 2 studies providing evidence in favour of test re-test reliability [18, 30]; one study [19] describing positive evidence on test-retest reliability using simple correlations but providing no information on time interval for the administration of the measures and the remaining study [21] assessing reliability using Spearman correlations instead of intra-class correlation coefficients.

Positive evidence for construct validity and mixed evidence of responsiveness of the SF-8 were reported [21]. One study provided mixed evidence for construct validity and positive evidence for responsiveness of the SF-6D [25]. The quality of study methodology was shown to be good for construct validity, mixed for test-re-test reliability, and poor for responsiveness.

Of the 5 studies evaluating the EQ-5D, one study showed positive evidence for construct validity [25]; 2 studies reported mixed evidence [19, 26] whereas the remaining studies [20, 22] had poor methodologies, subsequently limiting further assessment.

The responsiveness of the NHP was found to be favourable but construct validity and floor/ceiling effects were associated with mixed evidence [19]. For examining construct validity of the Profile of Mood States (POMS), no prior hypotheses of the strength and direction that the POMS would be related to other measures was reported [31]. However, the results presented showed statistically significant correlations with the PADQOL factors [31].

Condition-specific patient-reported outcome measures

Three papers evaluating the VascuQol provided good evidence for its construct validity and responsiveness [22, 23, 25]. Content validity and internal consistency were found to be positive in the one study [23] with some evidence in favour of the test-re-test reliability [23]. Evidence for internal consistency, test re-test reliability, responsiveness and acceptability were explored in relevant studies relating to the WIQ [20, 24, 2631]. On the other hand, Spertus et al. [30] reported Cronbach’s alpha of 0.94, indicating a possible overlap with other domains on the measure. Two studies [26, 30] found good evidence for the construct validity of the WIQ; however the others [20, 24, 2729] reported inconsistent evidence. A single study of exercise therapy [29] found positive evidence for the responsiveness of the scale but mixed evidence was described by two studies [20, 30].

One study reported good evidence on internal consistency and reliability of the AUSVIQOL in patients with PAD [18], but there was mixed evidence for construct validity. Overall, the study’s methodology was rated as fair. Good evidence about the internal consistency, test-retest reliability, construct validity and responsiveness of the PAQ was presented by Spertus et al. [30]. The PAQ was developed after a review of the medical literature, examination of the available measures, focus groups with clinicians and unstructured interviews with patients suggesting positive content validity. However, the methodology of the study was found to be poor. Good evidence was observed for the internal consistency and content validity of the PADQOL in one study [31]. Generally, the reported methodology was rated as good, but construct validity was found to have mixed evidence due to the lack of prior hypotheses [31]. The measurement properties of the ICQ were examined in a study [20] that reported a Cronbach’s alpha of 0.94, indicating high correlation between items. However, positive results were found for the test-retest reliability, content validity and responsiveness. In this study, mixed evidence was found for the construct validity due to a lack of a clear hypothesis. The methodology to assess these criteria was generally good, although the responsiveness received only a fair rating [20]. The SIPIC was evaluated with patients with lifestyle-limiting claudication [22]. Good evidence was found for construct validity and mixed evidence for responsiveness.

Two studies reported the psychometric assessment of modified PROM instruments. These were the modified telephone-administered WIQ [26] and the SF-8, an abridged version of the SF-36 [21]. Both the originally developed telephone-administered WIQ and the modified self-administered version were reported to be valid and reliable for objectively assessing community walking. The authors proposed that self-administration reduced the WIQ completion time, from five minutes to one minute. [26].

Discussion

Fourteen studies assessing the psychometric properties of 13 newly-developed and existing PROMs in patients with symptomatic PAD, regardless of specific presentation were included in this review. Substantial variations in the reporting of clinical presentation of PAD, management strategies and administration of instruments were noted. Evidence of superiority in the psychometric performance of a single PROM could not be established. This may be a reflection of the differences in patient characteristics and study methodology rather than the appropriateness of the instruments themselves.

Clinicians and researchers have a wide variety of PROMs to consider for patients with PAD. The review included generic PROMs as well as PROMs that covered PAD-related symptoms e.g. VascuQoL (pain); WIQ (walking speed) and PADQoL (symptoms and limitations of function fear and anxiety). Of the generic PROMs evaluated, the SF-36 showed the most complete and positive evidence in favour of use in a PAD population. The domains of the SF-36 provided a broader measure than the PAD-specific PROMs. This instrument included further questioning on the domains of pain and mobility, but also on specific fears. However, related studies were of mixed methodological quality. The review showed that using modified versions of the WIQ and SF-36 provided useful PROMs data in terms of test re-test reliability, construct validity and responsiveness. Nonetheless, adopting these instruments in practice requires more consideration of their appropriateness considering the extent of variation in the available literature. Although the WIQ provides a good condition-specific measure of mobility relevant to IC, it does not include QOL measures relating to PAD, in general. The VascuQol was found to have good internal consistency, test-retest reliability, construct validity and responsiveness as well as good content validity for measuring QOL of patients with PAD.

Several factors may influence the choice of a PROM. Careful consideration is required regarding whether a combination of measures should be recommended for use in symptomatic patients or whether a single PROM covering different aspects of health would be more appropriate for obtaining the patient’s perspective on treatment and general health. Furthermore, patients’ characteristics (stage of PAD, treatment, co-existing conditions) must be carefully considered. Included studies dealt with patients with symptomatic PAD and more research is needed to understand the relevance of using PROMs in those with asymptomatic PAD. This is of particular importance because PAD represents a continuum of clinical presentations. A decision about whether or not to use a single PROM or set of PROMs in practice should be at the discretion of clinicians or researchers. One key area of attention, however, should include the burden of administering a questionnaire (including format, setting, time for completion). In the study by Coyne et al. [26], the authors reported that the modified (self-administered) WIQ was reliable and valid when compared to the version administered by an interviewer over the telephone [26]. However, recent evidence suggests that the number of errors occurring during self-completion of the WIQ was unacceptably high [33, 34] and this will have implications for administering a tool as well as interpreting the findings of the self-completed PROMs. Furthermore, limited or unclear reporting makes inferences about completion time reasonably challenging. Methods for calculating completion time, additional support provided and reading level of participants were often not reported within included studies.

Whereas it is not possible to single out one measure for recommendation, it is evident that condition-specific measures were the only tools with reported content validity related to PAD. Based on the findings of this review, the PAQ and VascuQoL would seem to be appropriate condition-specific tools for predominantly English speakers. The ICQ could be selected as a tool of choice for patients with intermittent claudication, only. Measurements of PROMs must be practical, acceptable and reliable. Therefore, qualitative evidence based on patients’ views and experiences will also be valuable. Additionally, clinical trials which incorporate PROMs as outcome measures may be used to assess the performance of relevant PROMs but this is beyond the scope of this paper. Collectively, such evidence will help in selecting PROMs for use in routine practice.

Clear and complete reporting of validation studies is essential. The quality of reporting in the included studies was often, inadequate or ambiguous. For example, patient selection was not presented in a meaningful way in most studies. Whilst some studies explicitly stated that patients with more severe forms of PAD were excluded, a few studies did not provide information to identify any stratification of the study population. The ankle-brachial index (ABI) cut-offs for selecting patients in included studies were often not reported or varied across studies.

In this review, the methodological quality of the studies was evaluated on the basis of the COSMIN criteria. However, this checklist is time-consuming to apply and although it provides a method for assessing the quality of the studies, it has been criticised for being difficult to apply in a consistent manner [14]. The current review, similar to the study by Morris et al. (2014) [14], also demonstrated that many of the included studies had not reported on how missing information was handled. The approach used for handling missing data is a key criterion for the COSMIN checklist. Subsequently, most of the studies were rated low in terms of quality. Another systematic review of PROMs in patients with IC found that the methodological quality of most studies ranged from poor to fair [35]. Our review supports the findings of the review by Conijn et al. [35] confirming the need for better quality studies of PROMs.

Strengths and Limitations of the review

Comprehensive and iterative literature searching was undertaken to improve the retrieval of relevant studies. Our efforts improved article retrieval because more than half of included studies were not identifiable as validation studies by titles only. This review identified PROMs for patients with IC and other stages of PAD. It is possible that the differences in clinical states may have influenced the findings of psychometric assessments. Previous reviews have been much more restricted in their scope and limited in the range of sources searched. By broadening the scope of the population of interest, this study has also highlighted the evidence gap regarding validation of PROMs in patients with more severe forms of PAD or more specifically, patients with amputation due to PAD.

In an effort to identify suitable PROMs for patients receiving care for PAD within the NHS in England, we excluded non-English populations or PROMs developed or available in other languages other than English. As a result, potentially informative data, for example, from validation of non-eligible PROMs [3638] was not included in this study. The impact of excluding non-English populations or PROMs in this review is unclear. However, this approach was reasonable because of challenges with linguistic validation and cultural adaptation of outcome measures [9]. Literature searches were updated in January 2015 so more recent relevant studies may have been missed.

Implications for practice and future research

Due to heterogeneity and methodological quality of studies included in this review, no single PROM can be recommended for use. It is recommended that clinicians and researchers take into account the factors related to the burden of administration, patient characteristics and treatment strategies when selecting appropriate PROMs. Any suitable instrument should aim to cover all relevant domains of interest to patients.

The standardisation of study methodology and reporting must be encouraged with the view to improve interpretation of findings of validation studies. Existing minimum standards for PROMs [39] provides useful guidelines in designing, choosing and validating PROMs. The latter can be used alongside the COSMIN checklist to design and reporting validation studies. The next stage of our research is to complete a qualitative review to obtain patient’s views about factors that significantly affect their daily functioning and QoL whilst living with PAD and a review of PROMs as outcomes in randomised studies. It is anticipated that the evidence created will inform the selection or development of a new tool to obtain PROMs in patients with PAD attending clinics within the NHS, England.

Conclusions

This review provides an in-depth summary of PROMs evaluated in English-speaking patients with symptomatic PAD. No study provided evidence of a full psychometric evaluation in the patient population of interest. The consideration of diverse factors will help to identify a suitable PROM or combination of measures for clinical and health care decision-making. Additionally, standardised methodologies will help to substantially improve the interpretation of findings from validation studies.