Assessment of the quality of measures of child oral health-related quality of life

Gilchrist, Fiona; Rodd, Helen; Deery, Chris; Marshman, Zoe

doi:10.1186/1472-6831-14-40

Assessment of the quality of measures of child oral health-related quality of life

Research article
Open access
Published: 23 April 2014

Volume 14, article number 40, (2014)
Cite this article

Download PDF

You have full access to this open access article

BMC Oral Health Aims and scope Submit manuscript

Assessment of the quality of measures of child oral health-related quality of life

Download PDF

Fiona Gilchrist¹,
Helen Rodd¹,
Chris Deery¹ &
…
Zoe Marshman¹

10k Accesses
91 Citations
4 Altmetric
Explore all metrics

Abstract

Background

Several measures of oral health-related quality of life have been developed for children. The most frequently used are the Child Perceptions Questionnaire (CPQ), the Child Oral Impacts on Daily Performances (C-OIDP) and the Child Oral Health Impact Profile (COHIP). The aim of this study was to assess the methodological quality of the development and testing of these three measures.

Methods

A systematic search strategy was used to identify eligible studies published up to December 2012, using both MEDLINE and Web of Science. Titles and abstracts were read independently by two investigators and full papers retrieved where the inclusion criteria were met. Data were extracted by two teams of two investigators using a piloted protocol. The data were used to describe the development of the measures and their use against existing criteria. The methodological quality and measurement properties of the measures were assessed using standards proposed by the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) group.

Results

The search strategy yielded 653 papers, of which 417 were duplicates. Following analysis of the abstracts, 119 papers met the inclusion criteria. The majority of papers reported cross-sectional studies (n = 117) with three of longitudinal design. Fifteen studies which had used the original version of the measures in their original language were included in the COSMIN analysis. The most frequently used measure was the CPQ. Reliability and construct validity appear to be adequate for all three measures. Children were not fully involved in item generation which may compromise their content validity. Internal consistency was measured using classic test theory with no evidence of modern psychometric techniques being used to test unidimensionality of the measures included in the COSMIN analysis.

Conclusion

The three measures evaluated appear to be able to discriminate between groups. CPQ has been most widely tested and several versions are available. COHIP employed a rigorous development strategy but has been tested in fewer populations. C-OIDP is shorter and has been used successfully in epidemiological studies. Further testing using modern psychometric techniques such as item response theory is recommended. Future developments should also focus on the development of measures which can evaluate longitudinal change.

View this article's peer review reports

Assessing oral health-related quality of life in children and adolescents: a systematic review and standardized comparison of available instruments

Article 22 March 2018

Oral health-related quality of life in children using the child perception questionnaire CPQ11-14: a review

Article 14 February 2019

Effect of method of administration on the oral health–related quality of life assessment using the Early Childhood Oral Health Impact Scale (ECOHIS-G)

Article Open access 11 February 2021

Background

Patient reported outcomes can be defined as: “reports coming directly from patients about how they feel or function in relation to a health condition and its therapy without interpretation by healthcare professionals or anyone else” [1]. The drive for the use of patient reported outcome measures (PROMs) has come from the shift from a biomedical perspective to a broader biopsychosocial model of health [2]. The proposed benefits of such an approach to patient care are [3]:

1.
patients themselves are in the best position to assess the improvement in their symptoms or quality of life
2.
involving patients in their healthcare
3.
observer bias can be reduced
4.
consideration of patients’ views increases public accountability

PROMs were initially developed for use in research and following this further developed by clinicians to allow evaluation of individual patients. The increasing prioritisation of this approach to patient care allows the patient’s perception of the effects of clinical intervention to be understood by both clinicians and researchers [4]. As many dental conditions have psychological and social implications, the use of such instruments in dentistry is particularly appropriate [5].

As the development of such measures has increased, several groups have produced guidelines for PROMs in an attempt to aid appraisal and appropriate selection of these instruments. The Scientific Advisory Trust of the Medical Outcomes Trust initially published a set of criteria for assessment of health status and quality of life measures in 1996 [6]. These were updated in 2002 to reflect the emerging techniques being used in the development of these measures [7]. The authors suggest eight key areas for consideration (conceptual and measurement model; reliability; validity; responsiveness; interpretability; respondent and administrative burden; alternate forms and cultural and language adaptations) and criteria against which measures can be reviewed. These guidelines were developed to help the Medical Outcomes Trust (MOT) to evaluate new measures submitted to them, to ascertain which were suitable for dissemination. However, although they provide clear information regarding areas to be assessed, no specific quality standards were included.

More recently a checklist has been produced by the Consensus-based Standards for the Selection of Health Measurement Instruments initiative (COSMIN) which allows articles reporting on the evaluation of PROMs to be evaluated against defined criteria [8]. It is hoped that the use of this checklist will standardise systematic reviews of PROMs and identify areas for refinement. The categories match those of the MOT and the group has also produced explicit quality criteria for each category [9]. These criteria are shown in Table 1.

Table 1 Quality criteria based on those proposed by Terwee and colleagues[9]

Full size table

Over the past few decades there have been many PROMs produced, which purport to measure oral health-related quality of life (OHRQoL). OHRQoL was defined by Locker and Allen [10] as “The impact of oral diseases and disorders on aspects of everyday life that a patient or person values, that are of sufficient magnitude, in terms of frequency, severity or duration to affect their experience and perception of their life overall” [10]. However, a number of the questionnaires developed have involved only limited input from lay people. Therefore they may be more accurately described as measures of oral health status, as without patient involvement in their development it is difficult to ascertain whether the items accurately reflect what is important to patients [10].

The application of measures can vary according to the aim of the investigation, for example, they may be used to influence health and social policy, assess the impact of different treatment regimens or be used to analyse change in individual patients over time (Table 2).

Table 2 Summary of the applications of OHRQoL measures proposed by Robinson and co-workers[11]

Full size table

Although the criteria proposed by the MOT and the COSMIN group address the psychometric properties of outcome measures, they do not specifically focus on aspects relating to the purpose and patient-centred nature of the instruments and thus whether they contain items which may reflect OHRQoL. Locker and Allen [10] performed a review of OHRQoL measures using criteria modified from those suggested by Gill and Feinstein [12] and Guyatt and Cook [13] in order to explore these factors [10, 12, 13]. Specific questions were as follows:

1.
Is the stated aim to measure OHRQoL and is this explicit? If so, are these constructs defined and are the constituent domains identified?
2.
If not, is there an alternative construct measured by the instrument specified and defined and its constituent domains identified?
3.
Do the investigators specify the contexts in which the measure is to be used? Was it developed for use with groups (as in surveys or clinical trials) or individuals (as in clinical practice)?
4.
Were the items comprising the questionnaire derived from qualitative interviews with those intended to complete the questionnaire?
5.
Is there evidence that the aspects of life the items address are important to those who will be completing the questionnaire?
6.
Does the questionnaire contain global ratings of health-related quality of life or quality of life?
7.
How was the measure validated? Was it tested against oral health indicators or were broader indicators that may capture aspects of quality of life used? Is the stated aim to measure OHRQoL and is this explicit? If so, are these constructs defined and there constituent domains identified.

The review found that, although the measures covered a variety of areas such as functional and psychosocial aspects of oral health, there was a degree of uncertainty regarding whether they actually measured OHRQoL or quality of life.

Following the development of measures for use in adults, several questionnaires have been produced for use with children or using parents as proxies. These generic questionnaires are designed to cover a variety of oral conditions such as dental caries, malocclusion and craniofacial anomalies. They include the Child Perceptions Questionnaire (CPQ) [14–16], the Child Oral Impacts on Daily Performances Index (C-OIDP) [17], the Child Oral Health Impact Profile (COHIP) [18], the Early Child Oral Health Impact Scale (ECOHIS) [19] and the Scale of Oral Health Outcomes for 5-year-old children (SOHO-5) [20], the Michigan Oral Health-Related Quality of Life scale (MOHRQoL) [21] and the Pediatric Oral Health-Related Quality of Life Measure (POQL) [22]. All but the MOHRQoL and ECOHIS are designed for self-report.

The most frequently used measures for self-completion by children are the CPQ, the C-OIDP and COHIP. These measures were chosen for inclusion in this review as they cover a wide age range and variety of conditions and therefore most likely to be of use in a range of studies. Measures which are completed by proxies were not included as it has been demonstrated that there may be discrepancies between proxy scores and those provided by children themselves [23–25]. The CPQ is part of a battery of questionnaires for children and their carers [14–16]. There are versions for 11-14-year-olds, 8-10-year-olds and four short forms based on the measure for 11-14-year-olds. The C-OIDP was adapted for use in children from the Oral Impacts on Daily Performances index which is frequently used in adult populations [17]. Finally the COHIP, is designed for 8-15-year-olds and was derived from the same initial item list as the CPQ [18].

Although these measures are frequently used and have been translated into many different languages, to date there has been no review of their development, validation and use. Therefore the aim was to assess the methodological quality of the development and testing of CPQ, C-OIDP and COHIP. To fulfil this aim, the specific objectives were to:

1.
describe these measures and their use
2.
assess the methodological quality and measurement properties against existing criteria.

The criteria used were based on those described by Locker and Allen and COSMIN criteria [8–10]. The findings of this study will help researchers select the most appropriate measure to use in future projects and provide recommendations for refinement of these measures.

Methods

Search strategy

A systematic search strategy was used to identify eligible studies, using the Mesh terms “child” and “quality of life” in combination with the names or the commonly used acronyms of the three measures. Both MEDLINE (through PubMed) and Web of Science were used to search for articles published up to December 2012. Reference lists of included studies were also searched to identify additional studies.

Selection criteria

Titles and abstracts were read independently by two investigators (FG and ZM) to ascertain whether they met the inclusion criteria. Disagreements were resolved by discussion and where doubt existed, the full paper was retrieved. A paper was judged to be suitable for inclusion if:

it used either the CPQ, COHIP or C-OIDP (or versions of them)
it included participants aged 16 years or younger
the measures were completed by the participants, not proxies
the full paper was available in English
it reported primary data

Data collection

1. Description of measures and their use (Objective 1)

To fulfil objective one and describe the measures and their use, data were collected relating to:

the aim of the study
the measure used
study type (for example; development, validation, cross cultural adaptation, etc.)
population (i.e. clinical, school-based)
measurement properties (detailed below)
development of the measure, described using the criteria proposed by Locker and Allen [10]

Results were collected by two teams of two investigators (FG/HDR and ZM/CD) for all included studies. A protocol, with description of the data required to be collected was produced. The data collection spreadsheet was piloted using 10 articles, following which descriptors were added to each of the categories to aid completion. A training exercise was then held with all investigators to ensure consistency of data extraction. Where there was disagreement between investigators, this was resolved by discussion to reach a consensus.

2. Assessment of the methodological quality of the development and testing of measures (Objective 2)

The COSMIN checklist was used to evaluate the quality of studies that reported the development or evaluation of the original form of the CPQ, COHIP or C-OIDP in the original language [8]. This tool allows the methodological quality of studies to be assessed against criteria for each measurement property and has been used successfully in systematic reviews of outcome measures [26, 27]. The checklist contains 5–18 items per property which are rated excellent, good, fair or poor, with the lowest score for any item being assigned as the overall score for that property.

Two reviewers (FG and ZM) decided which properties had been assessed in each study and assigned an overall score. A calibration exercise was held prior to data collection to ensure consistency. Disagreements were resolved by discussion between investigators to reach a consensus. Both intra- and inter-examiner reliability were assessed and were found to be excellent (weighted Kappa = >0.9).

Quality assessment rating

The rating system proposed by Terwee and colleagues [9] was used to assess the quality of the instruments using the results of the studies evaluated by the COSMIN checklist. This allows a positive, negative or indeterminate rating to be assigned depending on the published results (Table 1).

Measurement properties analysis

Validity, reliability, responsiveness and interpretability of the measures were analysed using the following aspects [9]:

Content validity: The degree to which the items in the questionnaire are a reflection of those important to the study population and to the construct under scrutiny. Four main areas were assessed:

1.
Was the measurement aim stated, for example; is the questionnaire designed to be discriminative, evaluative or predictive?
2.
The concept which the questionnaire was designed to measure is stated so that others can use it appropriately.
3.
Methods for item selection and reduction are justified and should include the target population.
4.
Interpretability of the questions, for example, these should be age-appropriate and should not require reading skills above that of a 12-year-old where they are designed for adults.

Construct validity: this refers to the extent to which scores relate to other measures of a similar concept under scrutiny and should be tested using predefined hypotheses to avoid bias.
Internal consistency: the extent to which items in the questionnaire measure the same construct. In classic test theory, this is expressed using Cronbach’s alpha value. A low Cronbach’s alpha indicates a lack of correlation between items on the scale, meaning that combining them to give an overall score is not meaningful. Whereas, a very high value indicates excellent correlation, therefore some items may be redundant. Values of 0.7 to 0.95 are deemed to be acceptable for research tools. Principal component analysis or exploratory factor analysis, followed by confirmatory factor analysis are the preferred methods for attaining homogenous scales, as these allow redundant items to be removed and can be used to identify the number of subscales present. Criterion validity: this relates to whether the scores on a particular questionnaire have a positive correlation with a gold standard. There are no gold standards in the field of OHRQoL and therefore measurement of this is only appropriate when testing a short form against the existing measure.
Test-retest reliability: the ability of the measure to produce reproducible results in a stable population over time. The time between administrations should be long enough to prevent recall but short enough to minimise changes in clinical status. One to two weeks is usually adequate, however, the clinical concern under investigation may require a different time interval, for example, in palliative care where deterioration in a patient’s health may occur rapidly. The most suitable expression of this value is using the Intraclass Correlation Coefficient (ICC). Values greater than or equal to 0.7 are deemed acceptable.
Responsiveness: the ability of a questionnaire to detect clinically important changes over time, for example, after an intervention. Predefined hypotheses should be defined and tested.
Floor or ceiling effects: these were considered to be present where more than 15% of patients score the highest or lowest score possible. Where this is present, there may be issues with content validity as extreme ends of the scale are not represented. In addition, participants who achieved the lowest or highest scores cannot be distinguished from each other, reducing reliability.
Interpretability: the degree to which scores on the questionnaire can be given qualitative meaning. For example, the provision of means and standard deviation of scores of relevant subgroups (clinical diagnoses, age groups, gender).

Best evidence synthesis

A best evidence synthesis was performed to summarise the evidence for each measure based on the methodological quality, consistency of results and the number of studies.

Two reviewers (FG and ZM) assessed the evidence for each measure and assigned a rating. A training exercise was held to ensure consistency. Disagreements were resolved by discussion between investigators to reach a consensus.

The results were defined as:

strong evidence: consistent findings in multiple studies of good methodological quality or one study of excellent quality
moderate evidence: consistent findings in multiple studies of fair methodological quality or one study of good quality
limited evidence: one study of fair methodological quality.

Where there were only studies with poor methodological quality or where statistical methods other than those recommended were used, a lack of evidence was noted.

Results

The search strategy yielded 653 papers. Four hundred and seventeen were duplicates leaving a total of 236 abstracts. Following analysis of the abstracts, 126 full papers which appeared to meet the inclusion criteria were retrieved. Of these, six were excluded as they did not meet the inclusion criteria therefore 120 papers were included in the analysis (Figure 1). The majority used a version of the CPQ, most frequently the original version of CPQ_11–14 (Figure 1). Most papers reported cross-sectional studies (n = 117) with three of longitudinal design (Figure 2). The number of publications using these measures steadily increased from 2008–2011 and reached a peak of 21 in 2011. A decline, perhaps related to delays in indexing of the databases, was seen in 2012.

Fifteen studies which had used the original version of the measures in their original language were included in the COSMIN analysis. The following subsections will present findings relating to the evaluation of each questionnaire with the additional COSMIN analysis.

CPQ [14–16, 28–100]

This questionnaire was developed in Canada and was originally validated in children with caries, malocclusion and craniofacial anomalies. A number of versions have been produced. The original item pool was developed following a review of existing oral health and paediatric measures. This was further reduced following discussion with healthcare professionals, parents of children and children with a variety of oral conditions.

CPQ_11–14

Description of CPQ_11–14 and its uses

The aim of this questionnaire was to “produce a measure which conformed to contemporary concepts of child health and had discriminative and evaluative properties, and which is applicable to children with various dental, oral and oro-facial disorders”. Although not explicitly stated, the measure must therefore have been designed to measure change at a group level due to its aims. Potential items were divided into four domains: oral symptoms, functional limitations, emotional well-being and social well-being. An item impact study involving 82 children was used to reduce the number of items to 37 across the four domains. In addition, two global questions are included relating to the participant’s opinion of how their teeth and mouth affected their life overall and their perceived oral health status. The questions ask participants about the frequency of events in the previous three months and are scored on a five-point Likert scale from 0–4. A higher score indicates increased impact. The measure was validated by comparing scores between groups (caries, malocclusion, craniofacial) and by correlating overall scores with global ratings. Further details are shown in Table 3.

Table 3 Characteristics of included measures

Full size table

Study types/populations

Fifty five papers used CPQ_11–14. Of these, one described development of the measure and seven its validation. Cross-cultural adaptation and validation of these versions were described in 12 studies from Hong Kong, Brazil, Denmark, Uganda, Saudi Arabia, Thailand and Germany. One paper investigated agreement between self- and interview-administered versions, three studies analysed agreement between parent and child and one study reported on the changes in scores during orthodontic treatment. The remaining articles described OHRQoL in cross-sectional population studies and explored the impact of various dental and medical conditions.

CPQ_11–14 had been translated into Chinese, Brazilian-Portuguese, Danish, Luganda, Arabic, Thai and German. Further versions in Malay, Finnish, Norwegian and Russian, were described but no details were provided regarding their validation.

Measurement properties

Twelve studies reported test-retest reliability with ICCs ranging from 0.6 to 0.94. The test-retest period varied from one week to one month and involved between 14 and 84 participants.

Internal consistency was investigated in 20 studies for CPQ_11–14 with Cronbach’s alpha ranging from 0.72 to 0.95.

Criterion validity testing was not appropriate for this measure as there is no gold standard. Construct validity was measured using global ratings and clinical data. Positive correlations were found with global ratings but conflicting results were reported for correlations with clinical data.

No studies reported face or content validity testing, except during the development and cross-cultural adaptation of the measures.

Specific details regarding floor and ceiling effects were reported in only seven studies, with maximum proportions of 3 and 5% scoring zero or the maximum scores respectively.

Although one study reported longitudinal data, there was no reflection of what would be considered a clinically important change in score.

Mean and subgroup scores, where available, are shown in Additional file 1.

Assessment of the methodological quality of the development and testing of CPQ_11–14

The CPQ_11–14 was studied in four papers in children with dental caries, enamel defects, malocclusion and craniofacial disorders. The original form has been validated in Canada and the UK.

Validity

Hypothesis testing for construct validity was performed in all four studies using correlations with clinical data and global ratings. The methodology was rated excellent in two cases [70, 76] and fair in the other two cases [14, 77]. The results for construct validity were rated positively in all studies. Content validity was considered in one study of fair methodology and rated positively [14]. Criterion validity was not applicable for this measure as there is no gold standard.

Reliability

Internal consistency was analysed in all four studies and the methodology rated as poor in all, as the studies did not report testing of unidimensionality by factor analysis or item response theory. Therefore internal consistency was rated as indeterminate, however, it should be noted that all studies reported Cronbach’s alpha of between 0.7 and 0.95. Test-rest reliability was performed in three studies, one of which was rated as good [70], one fair [14] and one poor [76] and all had a positive ICCs.