Background

Cognitive impairment is increasing globally [1], as is the global population over 60 years old [2]. Pain is a well-documented, very prevalent issue in older adults with cognitive impairment, who often suffer from conditions like musculoskeletal disorders, malignancy, gastrointestinal and cardiac conditions [3,4,5]. It is estimated that at least 50% of older adults with cognitive impairment residing in long-term care (LTC) facilities have pain on a regular basis [6, 7].

Pain assessment is essential for adequate pain management [7, 8], but assessing pain in older adults with cognitive impairment remains a challenging issue due to impaired memory, changes in cognitive processing, and a reduced ability or inability to communicate verbally [7, 9]. Thus, caregivers may need alternative methods to obtain information about the person’s pain. When older adults with cognitive impairment cannot report pain themselves, the next best option – the so-called ‘silver standard’ – is assessment by the person who is most familiar with the patient’s everyday life [10]. However, previous research has reported that pain assessment in older adults with cognitive impairment often depends on a health care provider’s (HCP) subjective impression and occasionally appears to be mere guesswork [11, 12]. Therefore, in clinical practice, it may be useful for HCPs to use pain assessment tools that account for the population’s distinctive characteristics. However, pain assessment tools are used infrequently, which may contribute to the fact that un(der)managed pain remains a major problem in this population [6, 13, 14]. Furthermore, there is limited evidence regarding the measurement properties, feasibility and clinical utility of pain assessment tools for older adults with cognitive impairment. Currently, no one particular tool is recommended [9, 15, 16]. However, a 2014 meta-review that reviewed 28 tools developed specifically for pain assessment in people with dementia identified the Doloplus-2 pain scale as one of the better tools currently available [9].

The Doloplus-2 is based on the Doloplus, which was developed by Wary et al. in 1993 [17]. The Doloplus was based on a tool that used behaviour to assess pain in children with neoplastic disease (the Douleur Enfant Gustave Roussy scale). The Doloplus assessed pain in older people with verbal communication difficulties by assessing their behaviour using three subscales: somatic, psychomotor and psychosocial reactions to pain. Each subscale included five items (for a total of 15 items), and each item received a score of 0, 1 or 2 [18]. In 1994, a network of geriatricians from Switzerland and France began developing the Doloplus-2, based on the Doloplus. The Doloplus-2 has the same three subscales, but the total number of items was reduced to ten:

  1. 1)

    Somatic reaction to pain includes five items: ‘somatic complaints’, ‘protective body postures adopted at rest’, ‘protection of sore areas’, ‘expression’ and ‘sleep pattern’.

  2. 2)

    Psychomotor reaction to pain includes two items: ‘washing and/or dressing’ and ‘mobility’.

  3. 3)

    Psychosocial reaction to pain includes three items: ‘communication’, ‘social life’ and ‘behaviour problems’.

The ten items on the Doloplus-2 are scored from 0 to 3; higher scores represent more intense pain [19]. The total score can range from 0 to 30. The score for the somatic reactions subscale ranges from 0 to 15, the psychomotor reaction subscale ranges from 0 to 6, and the psychosocial subscale ranges from 0 to 9. If the rater considers an item inappropriate, the item is not scored. A combined score of 5 or higher suggests the presence of pain [19].

The Doloplus-2 covers most of the pain behaviour categories recommended in the American Geriatric Society’s guidelines for ‘The management of persistent pain in older persons’ [20]; only ‘change in mental status’ is missing. The Dolopuls-2 includes the categories ‘facial expression’, ‘verbalizations/vocalization’, ‘body movements’, ‘changes in interpersonal interactions’ and ‘changes in activity patterns or routines’. The Doloplus-2 indicates a progression of pain rather than pain experienced in a specific moment [16]. An HCP (e.g. physician, registered nurse, nursing assistant) who knows the patient well should score the Doloplus-2. According to the developers, a trained HCP can complete the scale in approximately five minutes [17]. The Doloplus-2 was officially validated in 1999 and was published in English in 2001 [17, 19]. The tool has since been translated into many different languages [21,22,23,24].

Several reviews of pain assessment tools for older adults with cognitive impairment have been published, including a meta review [9]. Some of these include the Doloplus-2 [15, 16, 25,26,27]. However, more studies on the Doloplus-2 have been published since the last systematic review in 2012 (these reviewers conducted a systematic search up to 2010) [26]. The Doloplus-2 is one of the more extensively tested tools for pain assessment [9, 15], and it has been identified as one of the most promising tools for pain assessment in older adults with cognitive impairment [9]. Furthermore, the scale is used in clinical practices and research across the world. For this reason, this review focuses solely on the Doloplus-2. It seeks to thoroughly examine the scale’s feasibility, clinical utility and measurement properties when used to assess pain in older adults as this evidence remains incomplete. A feasible, useful and accurate scale is essential to ensure that older adults in pain are correctly identified as such, consistently and over time. Furthermore, for a pain scale to guide pain management decisions and support efficient evaluations, it must be actionable and easy to interpret, and it cannot take so many resources that it disrupts clinical care. Therefore, this systematic review examines the feasibility, clinical utility and measurement properties of the Doloplus-2 scale when used to assess pain in older adults with cognitive impairment.

Method

This systematic review was prospectively registered with PROSPERO under reg. no. CRD42016049697. The PRISMA guidelines for reporting on systematic reviews were followed. Due to the clinical, methodological and statistical heterogeneity of the included studies, a descriptive approach was adopted in the research synthesis.

Data sources and search strategy

A systematic search was conducted in CINAHL (March 2016), Medline (August 2016) and PsycINFO (September 2016) in collaboration with a research librarian. The search strategy was formulated in CINAHL and adapted in Medline and PsycINFO, using keywords, Boolean operators and the database’s controlled vocabulary. The results were limited from 1990 to the dates the searches were performed (Additional file 1).

In addition to the systematic search, a search for the keyword ‘Doloplus’ was performed in the three databases (February 2017). In CINAHL, ‘all text’ was selected so that the entire article text was searched for the term ‘Doloplus’. Medline and PsycINFO do not have the ‘all text’ option for searching with keywords, so only titles and abstracts were searched for the keyword. The systematic and keyword searches in all three databases were saved immediately, and e-mail alerts were set up for every search. We received automatic e-mail notifications from all three databases whenever a new publication matching our search criteria (for the systematic or the keyword search) became available in the database. These monthly auto-alerts were reviewed until April 2017, and articles which met the inclusion criteria were included in this review.

In addition to the database searches, the list of previous publications (including publications from 1993 to 2008) provided on the Doloplus-2 online home page was reviewed. Articles which met the inclusion criteria were included.

Eligibility criteria

A study was eligible for inclusion if it: i) used the Doloplus-2 to assess pain in cognitively impaired patients (any stage) aged 65 and older; ii) were published in English, French, German, Dutch/Flemish or a Scandinavian language. Studies in which the Doloplus-2 was described but not used were excluded, as were studies in which the scale was used to validate other observational pain assessment tools. Dissertations, editorials, guidelines and expert opinion papers were excluded as well. Literature reviews were also excluded since they do not contain original data.

Process of study selection

The studies were selected in two steps. First, two reviewers independently screened the titles and abstracts to determine the studies’ eligibility for inclusion. Discrepancies and uncertainties were discussed by the reviewer team until a consensus was reached. In the second step, two reviewers independently assessed the full text of the articles for eligibility. The reference lists of the included articles were also reviewed for additional eligible studies to supplement the data sources previously described.

Quality assessment

Two reviewers independently assessed the quality of the included studies using the Mixed Methods Appraisal Tool (MMAT) [28]. The 2011 version of the MMAT allows for the description and appraisal of the methodological quality of five types of studies: i) qualitative, ii) quantitative randomized controlled trials, iii) quantitative non-randomized, iv) quantitative descriptive, and v) mixed methods. Each type has its own set of quality criteria. The criteria are scored ‘yes’, ‘no’ or ‘can’t tell’, followed by comments. The MMAT’s inter-rater reliability is moderate to excellent [29]. Since this is the first systematic review of the Doloplus-2, we wanted to provide a comprehensive review of the scale, so no study was excluded based on the quality assessment.

Data abstraction

All the reviewers used a standardized data abstraction sheet. Two reviewers independently abstracted information from the studies, including study objective, setting, sample characteristics, how the Doloplus-2 was administered and the results of the assessment, and clinical utility and feasibility data. Feasibility was defined as the time and resources required to collect and process the assessment, encompassing ease of use, the need for staff training, and the time required to complete the assessment [30]. Clinical utility was defined as ‘usefulness to clinical practice’: the scale’s usefulness in identifying pain and whether the result of the assessment could assist clinical decisions (e.g. administration of analgesics) [10]. Information about the Doloplus-2’s measurement properties was also abstracted. As a guide for abstracting data on measurement properties, we used the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes [31]. Different authors propose various criteria for assigning strength of association to particular values, but we chose the guidelines for instrument reliability and precision suggested by Hahn et al. [32].

Results

A total of 2692 citations were initially identified for possible inclusion through the systematic search of the three databases. The citations were transferred into Endnote and duplicates were removed; 2131 unique citations remained (Box A). An additional 649 publications were identified through other sources (Box B). There were so many additional publications because the other sources were manually screened, and we did not have a reference system to remove duplicates or those already retrieved through the systematic search. In total, 2780 publications were screened. After the titles and abstracts were reviewed, 42 full-text studies were assessed for eligibility. We were unsure whether five articles met the eligibility criteria, and we attempted to contact the corresponding author via e-mail. For two of those, no e-mail address was found. Of the three authors contacted, two did not respond, and one provided sufficient information [33]. Consequently, four studies were excluded because we were unable to determine whether they fulfilled the eligibility criteria [34,35,36,37]. Fourteen more studies were excluded based on a review of the full text (see Fig. 1). Articles reporting on the same research project but describing different or new results were included as separate sources [22, 38], [39,40,41] and [42, 43]. A qualitative synthesis was conducted on a total of 24 studies.

Fig. 1
figure 1

Flow diagram

Quality assessment

The quality assessment of the included studies are presented in Additional file 2. For 19 of the studies that used a quantitative descriptive approach, it was unclear if the sample was representative of the population under study [21,22,23,24, 33, 38,39,40,41,42, 44,45,46,47,48,49,50,51,52]. Furthermore, 13 studies did not provide sufficient information regarding response rate [21, 23, 24, 33, 38, 40, 44, 47,48,49,50,51,52].

Characteristics of included studies

Eight studies used a prospective observational design [22, 24, 38, 41, 44, 48, 49, 53], and five used a cross-sectional observational design [39, 40, 42, 43, 45]. Other studies used an action research design [52] or a pre- and post-test design [54, 55]. One was a pilot randomized controlled trial [56]. Seven studies did not report the study design [21, 23, 33, 46, 47, 50, 51] Fig. 1.

Participants were recruited using random sampling [51], purposive sampling [48], convenience sampling [39,40,41, 46, 47] and consecutive sampling [33]. In sixteen studies, the recruitment method was not reported [21,22,23,24, 38, 42,43,44,45, 49, 50, 52,53,54,55,56].

The characteristics of included studies are shown in Table 1. Twelve studies were conducted in Europe [21, 24, 33, 42,43,44, 47, 49, 52, 53, 55, 56], five in North America [39,40,41, 46, 51], one in Australia [48] and five in Asia [22, 23, 38, 45, 54]. One study [50] was a multinational collaboration between six countries. Of the 24 articles included, 23 were written in English, and one was written in Swedish [33].

Table 1 Characteristics of included studies

Eleven studies were conducted in a LTC setting [21, 22, 24, 38, 42, 43, 45, 48, 52, 53, 56]. Others were conducted in a hospital [23, 33, 44, 49, 51, 54, 55] or a combination of various settings [39,40,41, 46, 47, 50]. The sample sizes ranged from N = 6 [23] to N = 405 [45] participants; the percentage of female participants ranged from 33% [45] to 83% [48]. The mean age of participants ranged from 78.4 [23] to 88.1 [53]. Four studies [44, 49, 51, 55] used mixed samples that included patients with and without cognitive impairment. The ability of the participants to self-report pain varied across the included studies; nine defined the participants as nonverbal or unable to self-report pain [21,22,23, 33, 42, 43, 47, 50, 53], while in other studies, all of the participants were able to self-report their pain [24, 38, 49, 51, 54, 55]. For nine studies, the authors did not report the participants’ abilities to self-report pain or communicate verbally [39,40,41, 44,45,46, 48, 52, 56].

Feasibility and clinical utility

Table 2 shows the studies that examine the feasibility and the clinical utility of the Doloplus-2. Only four studies explicitly address feasibility and/or clinical utility [21, 22, 24, 53], but relevant information was also found in other studies. The mean totals of the Doloplus-2 baseline measurement ranged from 3.5 [45] to 22.7 [56]. All but two studies used all ten items in the scale [44, 51]. Every study that applied a cut-off used the recommended cut-off of ≥5 out of 30 [21,22,23,24, 33, 38,39,40,41,42,43, 45, 48, 49, 52,53,54, 56]. The percentage of participants who scored above the cut-off (indicating pain) ranged from 19% [49] to 96% [53].

Table 2 Feasibility and clinical utility of the Doloplus-2

Feasibility

Fifteen studies reported, in varying detail, that the raters received some form of training in how to use the Doloplus-2 to collect data [21,22,23, 38, 42, 43, 45,46,47,48,49,50, 53,54,55]. Nine of the studies included clear (but brief) information about the content of the training [21,22,23, 38, 42, 45, 46, 50, 54]. The training method was reported in nine studies [22, 38, 42, 43, 45, 46, 49, 50, 54], and six reported the duration or amount of training [7, 22, 38, 45, 46, 49]. Every study that described the trainer reported that a member of the research team provided the training [22, 39,40,41,42,43, 46, 48, 49, 54]. Two studies simply mention that training was provided without providing any details [47, 55], and one [53] refers to the procedure of another study [50]. In one study, raters gave feedback on the importance of being trained in data collection using the Doloplus-2 and of knowing the patients’ normal behaviour in order to use the Doloplus-2 correctly [21].

Ten studies specified that the raters were familiar with the patients’ normal behaviour [21,22,23, 38, 42, 43, 45, 47, 48, 50]. In the remaining studies, this was not clear or not reported. Most of the Doloplus-2 assessments were conducted by a person with a background in nursing [21,22,23,24, 33, 38,39,40,41,42,43, 45,46,47,48,49, 53, 54], sometimes in collaboration with research assistants (RA) or a researcher. In other studies, physicians [50] or an occupational therapist [56] performed the assessments. A description of the raters was not provided or was unclear in four studies [44, 51, 52, 55]. One study reported the initial impact of nurses’ qualifications: More highly qualified nurse raters tended to assign higher pain ratings on the Dololpus-2. The effect of nurse qualifications seemed to disappear with repeated use of the scale, and the number of raters did not bias the result [48].

On average, it took raters five to ten minutes per patient to complete the Doloplus-2 [49, 50, 53]. The raters thought that the scale’s administrative burden was small [21]. They also thought that the Doloplus-2 was feasible [23] and easy to use [50, 53] and that the manual was clear [24].

Clinical utility

In one study, after a year of regular Doloplus-2 assessments, patients’ pain scores decreased significantly, and HCPs’ use of analgesic therapy with non-opioids (Step 1 of the WHO pain ladder) increased significantly, from a baseline of 30% to 100% [53]. In another pre- and post-test study, participants in the experimental group were assessed with the Doloplus-2 and received significantly more analgesics than the control group, which was not assessed with the Doloplus-2 [54].

Some studies also evaluated the Doloplus-2’s usefulness. One study found that the scale was useful in assessing pain [22], whereas another study reported that the Doloplus-2 was the least useful of the three pain scales evaluated [24]. The scale has been reported to facilitate valuable discussions about patients [21]. Raters using the Doloplus-2 stated that the psychosocial items were difficult to understand and score [22, 24] and that these items should be cautiously scored because abnormal social reactions can also be caused by dementia [21]. Furthermore, the highest congruency between Doloplus-2 scores over 5 and registered nurses (RN) reporting ‘don’t know’ when proxy-rating pain was found on the psychosocial subscale [42].

When comparing the Doloplus-2 with other methods used to assess pain in older adults with cognitive impairment, one study in a nursing home found that nurses evaluated significantly more patients as having pain when using Doloplus-2 than when proxy-rating pain. With proxy-rating alone, nurses were not able to say whether one-third of the patients appeared to be in pain [42]. A second study found that patients reported more pain using the Visual Analogue Scale (VAS) than nurses did using the Dolplus-2 [49]. The same study also found that of all the patients who self-reported pain, only one in five scored ≥5 on the Doloplus-2. This raises the question of whether the cut-off score should be adjusted [42, 49]. The different study populations (verbal and nonverbal) may explain the different results. It is possible that pain behaviour in people who are able to self-report is different to that of people who cannot self-report due to more advanced cognitive impairment.

Measurement properties

Seventeen studies reported on one or more measurement properties of the Doloplus-2 (Table 3).

Table 3 Measurement properties of the Doloplus-2

Reliability

Internal consistency

The Cronbach’s alpha for the total scale ranged from 0.67 [49] to 0.84 [33, 49], indicating low to moderately good internal consistency across settings. The alpha coefficients for the total scale did not increase when any of the items were deleted [22], but they were lower for patients with dementia than for those who were not cognitively impaired [49]. The items in the Doloplus-2 are heterogeneous, so they are not expected to correlate well with each other since they reflect a variety of dimensions [42].

The Cronbach’s alpha for the subscales ranged from low to moderate or good internal consistency in the different settings, including nursing homes (0.60 to 0.84) [22, 42].

Test-retest reliability

Test-retest reliability was high to excellent in one study in a hospital setting (Intraclass Correlation Coefficient (ICC) = 0.96) [49]. The test-retest reliability for multilingual versions of the test in multiple settings was moderately good to high or excellent; the ICC ranged from 0.62 (the Dutch version) to 0.98 (the Italian version) [50].

Inter-rater reliability

Inter-rater reliability was tested using different statistical techniques (ICC, Pearson correlation, Kappa statistics, Wilcoxon signed rank, paired t-test, matching scores) [22, 23, 47, 48, 50]. Agreement among raters ranged from 0.73 [48] to 0.97 [50], indicating moderately good to high or excellent inter-rater reliability across settings. Agreement for the subscales ranged from 0.60 to 0.84 [22]. One study compared pain level categorizations (the Doloplus-2 total score was used to classify patients into groups with mild, moderate or severe pain) across raters and found moderately good agreement (0.42 and 0.50) on two testing occasions [48]. The mean κ values for pairs of raters at each pain intensity level (mild, moderate, severe) increased as pain intensity increased (from mild 0.04 to severe 0.38) [51]. High intensity behaviour is more obvious and most likely easier for raters to spot and agree on. One study found no statistically significant differences between the two raters in the total score [33]. Another study found no difference between mean total scores for RA-RN pairs but found a statistically significant difference between the mean total scores of RA-Nursing Assistant (NA) pairs; the NAs reported more pain cues than the RAs [38]. In another study, matching scores by researchers and RNs was 77.5%, p = <0.01 [23].

Validity

Content validity

The degree to which the (items of an) instrument seems to be an adequate reflection of the construct to be measured was only addressed in one study, which reported that that the scale pinpoints important pain clues [21].

Construct validity

A 1-factor solution was the best description in two studies using exploratory factor analysis [33, 48]. In a study using principal component analysis, items loaded on three factors, and each item was correlated with the originally belonged subscale in addition to the overall scale [22]. A single-factor model best described the correlation between Doloplus-2 and two other observational pain assessment tools (the Abbey Pain Scale and the Checklist of Nonverbal Pain Indicators), indicating that these scales measure the same single construct [48].

Cross-cultural validity was examined in three studies. In these, a group of experts or the raters of the scale reviewed the content of the translated versions of the Doloplus-2 [21,22,23].

To consider ‘hypothesis testing’, one study examined the correlations between the Doloplus-2 and the so-called ‘known correlates of pain’. This study found a statistically significant correlation between the Doloplus-2 and functional ability and depression in dementia [22]. Another study reported that there was no statistically significant difference between mean scores on the Doloplus-2 facial items across different levels of pain intensity [51]. A Known-groups technique was used to compare the Doloplus-2 scores of a ‘no pain’ group and a ‘daily pain’ group. This study found that the mean score was obviously higher in the ‘daily pain’ group than in the ‘no pain’ group. Another study reported low correlations between the Doloplus-2 and other measures of pain (the Pain Assessment Checklist for Seniors with Limited Ability to Communicate, the Pain Assessment in Advanced Dementia, the Visual Analogue Scale (VAS) and the Verbal Rating Scale) [24]. However, it is possible that self-rated pain, hypnotized correlates and other observational measures of pain, assess different dimensions of pain than the Doloplus-2 [22, 48]. One study reported that several items on the Doloplus-2 are related to delirium, depression and/or the severity of dementia; item 10 (‘Problems of behaviour’) on the psychosocial subscale appears to be the least specific [46].

Criterion validity

Five studies reported on the correlation between the Doloplus-2 and a ‘gold standard’ or ‘pain criterion’ [33, 42, 48, 49, 51]. A moderately high correlation (Spearman 0.7) was reported for the University of Alabama Birmingham Pain Behaviour Scale [33]. One study reported a low correlation (Pearson 0.4) with RNs’ yes/no rating of patient pain [48], and another study found that significantly more patients were evaluated as experiencing pain when using Doloplus-2 than with RNs’ proxy rating of pain [42]. No significant correlations were observed between the Doloplus-2 and the Facial Action Coding System at any level of pain intensity (mild, moderate or severe) [51].

One study reported a low correlation (Spearman 0.46) with patients’ self-assessment (VAS), but the correlation was higher in patients without dementia than in patients with dementia. Moreover, the Doloplus-2 predicted 41% of the variability in pain intensity as measured by the VAS where the somatic dimension explained the most [49]. Two studies compared the Doloplus-2 to experts’ pain ratings on the Numeric Rating Scale (NRS)-11. One found that the criterion validity of the Doloplus-2 was satisfactory and that the Doloplus-2 explained 62% of the experts’ pain score; the item ‘facial expression’ alone explained 48% of the experts’ scores [21]. The second study that used pain experts found no association between the experts’ ratings and the Doloplus-2 scores [47]. However, in this study, the criterion validity increased when the Doloplus-2 was administrated by a specialized geriatric nurse [47].

Responsiveness

Four studies examined the ability of the Doloplus-2 to detect changes in pain over time [53,54,55,56]. One study reported a statistically significant reduction in the total mean score after one year of monthly assessments [53], while three studies demonstrated a statistically significant reduction in the total [54,55,56] and subscale scores [55] post-treatment.

Discussion

This review synthesizes the available research on the feasibility, clinical utility and measurement properties of the Doloplus-2 pain scale in older adults with cognitive impairment. Previous reviews have concluded that there is limited evidence for the feasibility, clinical utility, and validity of the measurement properties of pain assessment tools for older adults with cognitive impairment [9, 15]. Based on the 24 studies summarized in this review, we draw a similar conclusion for the Doloplus-2. Of the studies evaluated, only four studies were assessed as high-quality studies based on the MMAT. There were significant variations in the designs and methods of analysis in the included studies. The majority were performed in LTC settings with patients with cognitive impairment and used small, heterogeneous samples, which limited the possibility of sub-group analyses. Consequently, it is difficult to draw conclusions about the suitability and effectiveness of the scale in various subpopulations (i.e. varying types and degrees of cognitive impairment). Furthermore, the methods of assessing pain with the Doloplus-2 varied across the studies. There was considerable variation in how the studies reporting on at least one of the COSMIN measurement properties assessed reliability, validity and responsiveness. Likewise for the handful of studies that explicitly assessed feasibility and clinical utility, which also used small samples.

Because older adults with cognitive impairment (especially in the severe stage) often have a limited ability to communicate pain, their expressions of pain may not be obvious and may be difficult to interpret. Consequently, it is essential that clinicians and researchers use appropriate, effective tools when assessing pain in older adults with cognitive impairment. Furthermore, the measurement properties of such tools are not fixed attributes of the scale and vary according to population [57, 58], and validation is a long process which needs to be repeated [47, 59]. These findings have several implications for clinical practice and future research.

First, it must be further evaluated whether and how the results of the Doloplus-2 assessment can guide clinical decisions and improve patient outcomes. This may vary across settings and populations. One important issue is whether all of the Doloplus-2 items detect pain, rather than other symptoms, in older adults with cognitive impairment [21, 22, 24, 46]. The overlap between manifestations of pain and those of delirium, dementia and/or depressive symptoms can make it difficult to assess and confidently identify pain (distinct from delirium or depressives symptoms) in this population, who are prone to these comorbidities [60, 61]. This may affect treatment decisions based on Doloplus-2 assessments and the quality of the pain management. Previous studies have reported that nurses and physicians experience some uncertainty about the accuracy of pain assessment in older adults with cognitive impairment, and they may be reluctant to administer analgesics as a result of this uncertainty [8]. A combination of Doloplus-2 assessment with the use of observational tools to evaluate comorbidities such as depressive symptoms and delirium may increase the scale’s validity and its ability to provide significant clinical information about pain in this population.

The Doloplus-2 is one of the few observational pain assessment tools that provides a cut-off to categorize patients with ‘pain’ and ‘no pain’ [9]. The developers of the Doloplus-2 recommend a cut-off ≥5, but they also point out that pain cannot be excluded even with a score below 5 [17, 19]. A cut-off score can make the results of the assessment easier to interpret and more meaningful and actionable [58, 62] in clinical practice and research. To our knowledge, this cut-off, which is based on clinical experience [19], has not been evaluated. Questions have been raised about whether the established cut-off will entail an under- or overestimation of pain [43, 49]. According to the Doloplus-2 Group, higher scores indicate increasing pain intensity [19]. However, there is no evidence supporting the assumption that HCPs can determine pain intensity from patient behaviour [15], nor is there evidence suggesting that it is appropriate to assume that intensity of behaviour is proportional to intensity of pain. Therefore, we argue that the Doloplus-2 only indicates whether a patient may be in pain or not; it does not indicate anything about the intensity of the patient’s pain. Thus, there is a need to validate the cut-off score and to examine HCPs’ interpretations of the (change in) score. How the score informs clinical decisions and actions must also be evaluated, as this is an important indication of the scales’ clinical utility in everyday practice.

Second, more research is needed concerning the feasibility of the Doloplus-2 across settings and populations. There appear to be large variations in how the Doloplus-2 is administered. These variations include the raters’ professional qualifications, the training provided (if any), and raters’ familiarity with the patients’ usual behaviour and habits. As the developers of the Doloplus-2 point out, using the scale requires training [17]. The raters need to understand how it works and the terminology used in the scale. Use of the scale also requires an ability to note changes in a patient’s usual behaviour and an awareness of pain and pain control in older adults not able to self-report pain [17, 19] in order to plausibly achieve the best fit between the rater’s assessment and the patient’s experience [9].

However, while such an ideal situation might be feasible for a research study, is it feasible for everyday clinical use? Providing training and securing the availability of staff familiar with patients demands many resources and may impede the scales’ feasibility. Across health care settings, staff turnover is high and changing work shifts are common. Furthermore, a shortage of nurses is projected in the next 10 to 20 years [63]. Therefore, the most realistic scenario involves a care facility with a significant number of HCPs who have varying amounts of training, professional and personal skills, and familiarity with the patients administering the scale, which may affect its reliability [38].

The administration, scoring and interpretation of the scale also needs to be described in an unambiguous, reproducible manner. According to the Doloplus-2 guidelines, items on the scale should not be scored if they do not apply to the patient [17]. This is a methodological concern because the total score is affected by unanswered items. It is not clear whether a minimum number of items must be answered in order to use the scale correctly [54]. Consequently, if the Doloplus-2 is to be used in everyday clinical practice, it may be necessary to evaluate the scales’ guidelines and determine what actually works in the variety of settings where older adults with cognitive impairment receive health care. Furthermore, how to effectively and easily facilitate everyday use while obtaining valid, reliable results should be explored.

Third, the Doloplus-2 is based on sound assumptions about the multidimensionality of pain. Its items are supported by the literature on how older adults who are unable to communicate verbally express pain [15]. However, the results of our review suggest that there is limited research on the validity of the content of the Doloplus-2. No studies have been done to determine whether clinicians and experts in the various fields of caring for older adults with different types and stages of cognitive impairment consider the scale to be comprehensive. As previously discussed, some items of the Doloplus-2 have been reported to be difficult to administer, probably because the items are somewhat unspecific regarding pain, which may lead to uncertain results. Even though face validity only provides information about whether the Doloplus-2 appears to measure pain, it is still important, as clinicians and experts need to have confidence in the scales’ relevance to the construct they want to measure.

Furthermore, it is necessary to evaluate whether the items are equivalent in all multilingual versions, and whether all translated versions of the Doloplus-2 are conceptually, semantically and operationally equivalent [58] to the original French version. If different versions of the Doloplus-2 are not equivalent, it is uncertain whether observed differences in, for example, pain prevalence assessed with the Doloplus-2 are due to actual differences in pain or subtle variations in what the tool is actually measuring. Comparing results and interpreting differences or similarities must be done with caution [58]. Additionally, translation issues, such as ambiguous wording that different raters may understand differently, may lead to inconsistency in scoring some items [21].

The results of our review suggest that it is difficult to establish the construct and criterion validity of the Doloplus-2. The studies included in this review used a variety of hypothesized pain criteria and pain correlates (measures for the same/unrelated constructs) to test these aspects of the scale’s validity. Moreover, tests were conducted under a wide range of circumstances and samples. There is no gold standard to use as a benchmark for the assessment of pain in older adults with cognitive impairment due to the subjectivity of pain, and that makes it difficult to evaluate the scale’s criterion validity [9].

There is also a lack of interventional studies using rigorous investigation methods, and there is limited evidence regarding the responsiveness of the Doloplus-2. An unresponsive instrument may indicate an improvement in the patient’s pain when there actually is none, or it may fail to detect true improvement. There is some controversy over trying to test ‘responsiveness’ as a property of an instrument as it is hard to disentangle the instrument’s characteristics from the characteristics of the treatment provided [58]. However, it is important for clinicians and researchers to know if an intervention induces change in the patient’s condition. Therefore, future research should investigate whether the Doloplus-2 measures change in a meaningful way and whether it can be used to evaluate the effect of pain treatments in older adults with cognitive impairment.

Strengths and limitations

This review has several strengths. We used systematic methods and multiple sources to identify relevant studies. We also included articles written in other languages than English. Two reviewers independently assessed the titles, abstracts and quality of the studies. The MMAT was used for quality assessment to allow for the different study designs included in this review, and, in order to provide a comprehensive review, studies were not excluded based on methodological quality. Two reviewers independently abstracted data according to the COSMIN guidelines; this meant that measurement properties were assessed in a uniform way to avoid confusion regarding relevance, terminology, definitions and design.

One limitation of this review is that the authors of the included studies may have used different definitions for the measurement properties than those provided by COSMIN, which may have led us to misinterpret or misrepresent their findings. An example provided by the COSMIN initiative is the definition of ‘responsiveness’, which may be defined as “the ability to detect clinically important change” or as “the ability to detect change in the construct to be measured”. These definitions reflect different constructs [31].

Furthermore, our findings are limited due to the heterogeneity of the included studies. Also, some quality criteria of included studies may have been rated as insufficient simply because the necessary information was not available. Four studies that may have had important findings were excluded because we were unsure whether they fulfilled the inclusion criteria. Although we tried to contact the authors of these articles, we were unsuccessful, which may be due to the fact that some of these studies were published ten to fifteen years ago. Finally, approximately one-third of our included studies were retrieved from the supplementary sources. This might indicate a possible bias in the systematic search strategy in the databases, such as missing indexed terms, possibly resulting in a lower number of articles and thereby incomplete conclusions.

Despite these limitations, our review is relevant for both clinicians and researchers. It provides valuable insight about the evidence regarding aspects of the use and the measurement properties of the Doloplus-2. It also highlights some of the complex, challenging issues in the field of pain assessment in older adults with cognitive impairment.

Conclusion

The Doloplus-2 has been cited as one of the more extensively tested and promising tools for pain assessment in older adults with cognitive impairment. Still, this review suggests that there is a lack of comprehensive, high-quality evidence regarding the feasibility, clinical utility and measurement properties of this scale when assessing pain in older adults with cognitive impairment. Further research should examine the Doloplus-2 across a range of settings. Moreover, future studies should use more homogenous samples and provide clear definitions of the type and stage of cognitive impairment and pain. Also, more studies should be done using rigorous methods and large sample sizes in order to better allow clinicians and researcher to assess the tools’ effectiveness and appropriateness for measuring pain in older people with cognitive impairment.