Systematic review of measurement properties of the Canadian Occupational Performance Measure in geriatric rehabilitation

Key summary points Aim To make a systematic overview of measurement properties of the Canadian Occupational Performance Measure (COPM) for people in geriatric rehabilitation. Findings COPM showed moderate inter-rater reliability, good test–retest reliability, good content and construct validity, and moderate responsiveness in geriatric rehabilitation. When studying construct validity, authors used a variety of comparator instruments and different hypotheses. Message This overview of properties shows that the COPM gives relevant information for geriatric rehabilitation, and scores can be assessed reliably and are responsive to change. Supplementary Information The online version contains supplementary material available at 10.1007/s41999-022-00692-8.


Introduction
In our aging society, a growing number of older people experience a sudden decline in functioning, for instance due to hip fracture or stroke. As a consequence, they lose their ability to live independently, and participation in society is limited. Geriatric rehabilitation is rehabilitation for persons with multimorbidity and frailty, that offers treatment focusing on improving functioning and participation. Geriatric rehabilitation is offered in widely different settings across the world, depending on national reimbursement policies and local availability [1]. It can be community-based or hospital-based, provided in skilled nursing facilities or on an outpatient basis [2].
During geriatric rehabilitation, it is important to agree on individual goals that both individual patients and professionals regard as important. The Canadian Occupational Performance Measure (COPM) can be useful to explore problems in daily functioning [3]. It is a personalized, client-centered instrument. In a semi-structured interview, the care professional explores occupational performance problems experienced by the patient in three areas of everyday living: selfcare, productivity, and leisure. Patients are asked to select up to five of the most important problems and rate their own level of performance and satisfaction with performance on a 10-point scale. From the list of most important problems, two average scores are calculated: the COPM-Performance score (COPM-P) and COPM-Satisfaction score (COPM-S), that can range from 1 to 10. The COPM has been shown to give relevant and comprehensive information, i.e., good content validity, in various populations [4][5][6]. As such, it can provide valuable input for decisions on intervention goals and to guide the rehabilitation process.
Currently, the COPM is mostly used in adult rehabilitation and slowly finds its way in geriatric rehabilitation. It is important to understand the value of (changes in) COPMscores, and whether they are helpful in evaluating treatment in this setting. Although there are some reviews on measurement properties of the COPM in a broad range of populations, there are none specifically for the geriatric rehabilitation population [7,8]. Patients in this setting have specific characteristics, such as older age, comorbidity, frailty, and cognitive impairment that may influence the usability, reliability, and validity of the COPM [2]. We, therefore, performed a systematic review to examine whether the COPM is a valid, reliable, and responsive instrument for measuring problems in occupational performance in the geriatric rehabilitation setting .

Methods
The following seven databases were searched on 28th of March 2019 and updated on 14th of March 2022: PubMed, Embase, Emcare, Web of Science, Cochrane Library, Academic Search Premier, and PsychINFO. The search string included the terms 'Canadian occupational performance' and one of the following 'Aged, elderly, geriatric, homes for the Aged, Health services for the Aged, Senior Centers, old/older/aging [population or person]' (see supplement 1 for full search string).
To select publications, we used the following eligibility criteria: (1) COPM is mentioned as an assessment tool in title or abstract; (2) the publication reports on measurement properties of interest: content validity (with interviews; exclusion of studies solely describing the total and prioritized problems), construct validity, reliability, and responsiveness; (3) date of publication from 1991 onwards; (4) mean or median age of population 60 years and over; (5) studies in rehabilitation setting.
For the assessment of the publications, we used the method and checklists described in the COSMIN manual for systematic reviews of PROMS [9,10]. For each of the evaluated measurement properties, the methodological quality was evaluated using the criteria for risk of bias outlined in the respective boxes of the COSMIN manual. Each box contains a checklist, and the final methodological quality is determined using the 'worst score counts' method. This is a rather stringent method, and we will report the reason for the worst count. One author (MdW) screened all publications. Two authors (MdW and MH) independently extracted the information from selected publications, and then compared their information. When in doubt, co-authors were consulted (AD, WA). For the syntheses, we grouped the studies for each measurement property: • Content validity is the degree to which the content of an instrument is an adequate reflection of the construct to be measured. We gathered information on three aspects: (1) relevance for construct, for target population, and for context of use; (2) comprehensiveness; and (3) comprehensibility [11]. • Construct validity is the degree to which the COPMscores are consistent with hypotheses. We gathered information on method used, intervention and followup time, and results of analyses: correlations of scores between COPM and comparator instruments (convergent or divergent validity), and differences in scores between groups (discriminative validity). Reports on predictive validity were excluded. We described whether authors defined hypotheses for expected correlations beforehand. We also described the size of the correlations regardless of statistical significance, in accordance with COSMIN. In reporting, we interpreted correlations of sizes < 0.30 as low, 0.30-0.50 as moderate, and ≥ 0.50 as high [12]. • Responsiveness is the ability to detect change over time in the construct to be measured. This is done by comparing changing scores on the COPM to scores on other instruments, or by looking at differences between subgroups, or by comparing change before and after intervention. We gathered information on method used, results of analyses, and hypotheses formulated by authors regarding the effect of the applied intervention. To assess whether change can be expected, we gathered information on the intervention and follow-up time.
• Reliability is the degree to which the measurement is free from measurement error, over time (test-retest reliability) or between persons (intra-rater or inter-rater reliability). We gathered information on correlations or ICC.

Characteristics of included studies (n = 13)
The literature search resulted in 292 publications, of which 43 reported on measurement properties including clinical utility. After reading full texts, 12 publications reporting on 13 studies were selected to be included (see supplement 2 figure S2.1, PRISMA flow diagram). They reported on content validity one time, construct validity eight times, responsiveness seven times, and reliability three times (see Table 1). Almost two-thirds of the studies (eight of 13) included patients with various diagnoses. The other studies included patients with one type of diagnosis, i.e., stroke, hip fractures, rheumatic diseases, COPD, and Parkinson's disease, respectively. The settings were described as acute (one in Canada) or sub-acute in-hospital rehabilitation (two studies in Australia), outpatient setting (one study in Norway, one in the Netherlands, and one in the UK), community-or home-based rehabilitation (one in Norway, one in the Netherlands), or a combination of settings (four studies in Denmark, one in Sweden). Baseline mean or median COPM-Performance (COPM-P) scores ranged from 2.2 to 5.2. The baseline mean or median COPM-Satisfaction (COPM-S) scores ranged from 2.7 to 7.4 (outlier). Mean COPM-scores for Performance and Satisfaction differed only slightly, except for two studies, one with lower COPM-P scores [13] and one with much higher COPM-P scores [14] compared to the COPM-S scores. Three studies did not report baseline COPM-scores. Methodological quality was rated for each measurement property. In the one study on content validity and one study on reliability, the methodological quality was rated as doubtful. For construct validity and responsiveness, methodological quality was rated as very good to adequate in half of the studies (see Fig. 1).

Content validity
Content validity was reported in one study [6]; see supplement 3 (table S3.1). In this study, the constructs of COPM were described as occupational performance and satisfaction with performance. Patients first finished the COPM-interview, and in addition were asked about coverage of occupations and experiences. Next, these answers and COPM-data were used to answer predefined assumptions on content validity. Tuntland et al. concluded that the COPM showed good content validity on the topics relevance, comprehensiveness, and comprehensibility. For the topic comprehensibility, methodological quality was rated as adequate. For the topics relevance and comprehensiveness of the COPM-items, methodological quality was rated as doubtful due to the item 'analysis by 1 instead of 2 researchers', as the involvement of a second researcher was not explicitly described in the text. All other items on appropriate of methods were rated as very good or adequate.

Construct validity
Construct validity was reported in eight studies; see Table 2. All studies analyzed construct validity by comparing scores between instruments. One study [15] also analyzed discriminative construct validity by comparing groups, as presented in a separate row in Table 2. All but one study [16] analyzed both COPM-P and COPM-S scores. Scores were compared with 11 other instruments for functioning, and for example with instruments for quality of life (EQ-5D, WHO-5), mental functioning (SF-36 mental, FIM cognitive), impact of sickness (AIMS2, SA-SIP30), or coping (SOC-13). One instrument shows many similarities with the COPM: the OSA 'Occupational Self-Assessment', with competences, values, and priorities for change [13,17], although it has a predefined list of competences.
Methodological quality was rated as very good or adequate for five studies. However, one of these studies did not use any preformulated hypotheses [18]. In the other four studies, the hypotheses of the authors were confirmed for COPM-P and COPM-S [6,14,15,19]. Tuntland [14,19].
In the remaining studies, methodological quality was rated inadequate or doubtful, because measurement properties of the comparator instruments were not reported (see supplement 3 table S3.2). Still, results of Enemark Larsen et al. are particularly interesting, as they conducted comparisons with OSA, which, like the COPM, scores values, and priorities: however, hypotheses of moderate correlations with OSA were not confirmed [13].

Responsiveness
Responsiveness was reported in seven studies; see Table 3.
The first COPM was assessed during intake or (the week of) admission, or 3-5 days after surgery [19]. Follow-up varied roughly from 3 weeks to 4 months. Four studies used a 'construct approach comparator': they compared delta scores on COPM with delta scores on another instrument, using correlations [18][19][20] or predefined hypotheses [6]. Methodological quality scored adequate in these four studies. In three studies, predefined hypotheses were (partly) confirmed, showing moderate responsiveness. For change scores on COPM-P, high correlations were found with change in WOMAC physical function and FIM total and physical scores. Moderate correlations were observed with changes in FIM cognitive scores and SPPB, and low correlations were observed with change in SF-36 physical and mental scores and WHO-5 and EQ VAS. Tuntland et al. and Enemark Larsen et al. also compared COPM change scores in subgroups with no/little/much improvement based on an anchor question on patients' impression of change, and both found relevant differences between groups, especially in COPM-P scores (e.g., for Tuntland et al. 1.5 pt between subgroups little versus much change). As Enemark Larsen et al. did not describe characteristics of the subgroups, methodological quality was rated doubtful.
Three studies used a 'construct approach intervention' and analyzed post-intervention change scores [14,21], and one study also compared standardized response measures (SRM) with other instruments [22]. Methodological quality for these studies was doubtful/inadequate. Only Roe et al. formulated hypotheses beforehand; however, these were based on statistical significance and not on the size of the effect. The COSMIN manual advises against interpretation of results when no hypotheses about the size of the effect are formulated in advance. However, a meaningful change (that is > 2 pt, as defined by the authors of the COPM) in COPM-scores was observed in two studies, suggesting good responsiveness.

Reliability
Reliability was reported on in three publications with four studies [16,23,24]; see

Discussion
In this systematic literature review, COPM showed moderate inter-rater reliability, good test-retest reliability, and good content validity in GR patients. For construct validity, 4 studies with minimal risk of bias showed good construct validity. In 5 other studies, either considerable risk of bias was present, or the authors did not formulate hypotheses for their comparisons between instruments, which hampered our        (25 studies up to 2014) in the introduction of their publication, without reporting on their search methods or age of study populations [26]. Carswell et al. systematically included 19 studies up to 2004 with various populations and concluded that the COPM is a valid, reliable, clinically useful and responsive outcome measure [7]. The latter review included findings from different kind of settings, and studies were not critically appraised on methodological quality. In the present systematic review, we focused specifically on the GR setting and took a more rigorous approach. We included 9 different studies compared to the previous reviews (8 studies after 2014), thus providing more up-to-date and substantiated results. Two researchers extracted results and scored risk of bias independently following COSMIN guidelines [9]. This is a very strict method, as the item with the lowest score defines the methodological quality.
For each property of the COPM, we found at least one study in the geriatric rehabilitation setting. In general, this gave sufficient evidence, although the number or quality of studies that is needed as evidence has been a point of discussion [27]. We chose to include studies of patients with an average age of 60 years and older, which some might consider too low for geriatric rehabilitation as in a EU survey most referred patients were older than 70 years [28]. However, we followed the European consensus statement that recommends focusing on frailty rather than age when referring patients to geriatric rehabilitation [1,2]. If we had limited ourselves to a mean/median age of 70 years and older, our conclusions would have been the same (based on 6 studies) except for inter-rater reliability (no study ≥ 70 years). Two of the ten included studies were in a home-based setting. Although this was geriatric rehabilitation, this population is probably cognitively less impaired than patients in institutional settings.
It is noteworthy that a large variety of instruments was used in the comparison with COPM-scores to determine construct validity. Not only measures of physical functioning, but also instruments for quality of life, mental functioning, impact of sickness, and coping were used. Moreover, we found that for the same (type of) instruments, hypotheses among authors varied considerably (see Table 5 tell us, and it shows that its construct is ambiguous. This is also the reason why we chose not to formulate our own construct hypotheses and interpret study results accordingly for this review, as COSMIN suggests. Also, it is questionable whether studying divergent construct validity is informative. Three studies finding low correlations between the COPM and other measures concluded that these confirmed their hypotheses regarding construct validity [6,16,22], because the COPM is an individual (patient-unique) measure, and low correlation would support the notion that COPM provides information that is not obtainable with other standardized measures. Cup et al. added that only 25% of problems reported in the COPM were present in the standardized measures. However, these low correlations merely confirm what the COPM does not measure and do not tell us anything about the construct that the COPM does measure.
Looking beyond measurement properties, various studies examining feasibility and clinical utility of the COPM found that implementation of the COPM in practice is not without difficulty. Challenges regarding scoring are often mentioned, in particular for patients with cognitive impairment [6,14,22,29]. Especially, older clients may not be familiar with the use of scales, or understand their meaning. As the COPM leaves room to be performed in personalized styles, this ability to score the instrument may depend in part on the interviewing skills of the occupational therapist as well. Kjeken et al. also reported problems with clients' ability to perceive the difference between satisfaction and performance scores [22]. This is in line with our observation that studies report COPM-P and COPM-S scores in the same range. In a literature review [30] and in qualitative research [14,29] it was concluded that the COPM ensures a clientcentered approach and facilitates client engagement. Especially in geriatric rehabilitation, this is important, because more than in rehabilitation for younger persons, GR is about finding a new balance, often with a higher degree of dependency, while trying to preserve autonomy and self-management. Kjeken et al. found that some patients are anxious about being an active participant in the treatment process, but remarked that this in itself can be valuable information for therapists [22]. In conclusion, training and a good introduction to the COPM seem necessary, as the therapist must develop a client-centered approach.
Structured assessments by healthcare professionals are important to evaluate rehabilitation progress. Our results show that COPM-scores may play a role in the evaluation of geriatric rehabilitation on an individual level. We found that responsiveness was moderate in three studies that scored adequate for methodological quality. Worth mentioning is that Tuntland et al. and Enemark Larsen et al. studied interpretability, and both recommended a higher cut-off point for minimal important change, using, respectively, 3 or 3.5 instead of 2 points mentioned in the COPM-manual [6,20]. However, it is still uncertain whether aggregated (average) COPM-scores from various departments or care organizations can be used in benchmarking. Unfortunately, for this purpose, evaluation of geriatric rehabilitation is mostly based on the easily available parameters such as length of stay and costs.
To conclude, the use of the COPM will give occupational therapists and the multidisciplinary team information that is relevant for geriatric rehabilitation, as shown by the study on content validity. This can help to make treatment more personalized and client-centered. Also, the progress of the rehabilitation can be evaluated, because the COPM-scores can be assessed reliably and are responsive to change. And although there were many studies on construct validity, authors had different opinions on exactly what COPM-scores tell us, as they used a variety Table 5 Overview of comparator instruments used in studies for construct validity, with predefined hypotheses of authors. Instruments are in alphabetical order, for physical functioning and other constructs of comparator instruments and different hypotheses. As such, consensus on exact interpretation of the scores is needed, especially of aggregated (average) scores outside the context of direct patient care, e.g., when comparing groups of patients in research or in benchmarking.