Rasch analysis of the Patient and Observer Scar Assessment Scale (POSAS) in burn scars

van der Wal, Martijn B. A.; Tuinebreijer, Wim E.; Bloemen, Monica C. T.; Verhaegen, Pauline D. H. M.; Middelkoop, Esther; van Zuijlen, Paul P. M.

doi:10.1007/s11136-011-9924-5

Rasch analysis of the Patient and Observer Scar Assessment Scale (POSAS) in burn scars

Open access
Published: 20 May 2011

Volume 21, pages 13–23, (2012)
Cite this article

Download PDF

You have full access to this open access article

Quality of Life Research Aims and scope Submit manuscript

Rasch analysis of the Patient and Observer Scar Assessment Scale (POSAS) in burn scars

Download PDF

Martijn B. A. van der Wal^1,2,3,
Wim E. Tuinebreijer¹,
Monica C. T. Bloemen^1,2,
Pauline D. H. M. Verhaegen^1,2,3,4,5,
Esther Middelkoop^1,2,3 &
…
Paul P. M. van Zuijlen^1,2,3,4,5

7394 Accesses
109 Citations
Explore all metrics

Abstract

Purpose

The Patient and Observer Scar Assessment Scale (POSAS) is a questionnaire that was developed to assess scar quality. It consists of two separate six-item scales (Observer Scale and Patient Scale), both of which are scored on a 10-point rating scale. After many years of experience with this scale in burn scar assessment, it is appropriate to examine its psychometric properties using Rasch analysis.

Methods

Cross-sectional data collection from seven clinical trials resulted in a data set of 1,629 observer scores and 1,427 patient scores of burn scars. We examined the person–item map, item fit statistics, reliability, response category ordering, and dimensionality of the POSAS.

Results

The POSAS showed an adequate fit to the Rasch model, except for the item surface area. Person reliability of the Observer Scale and Patient Scale was 0.82 and 0.77, respectively. Dimensionality analysis revealed that the unexplained variance by the first contrast of both scales was 1.7 units. Spearman correlation between the Observer Scale Rasch measure and the overall opinion of the clinician was 0.75.

Conclusion

The Rasch model demonstrated that the POSAS is a reliable and valid scale that measures the single-construct scar quality.

Patient-Reported Outcome Instruments for Surgical and Traumatic Scars: A Systematic Review of their Development, Content, and Psychometric Validation

Article 29 June 2016

Development of the Patient Scale of the Patient and Observer Scar Assessment Scale (POSAS) 3.0: a qualitative study

Article Open access 10 November 2022

A systematic review of objective burn scar measurements

Article Open access 27 April 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Burn scars are known for their impact on the quality of life due to an array of functional, cosmetic, and psychological problems, related to scarring [1–3]. Several appropriate instruments are available that have been tested and validated to evaluate scar quality [4–6]. Scar assessment scales are often used because they are easily accessible and free of charge [7, 8].

In 2004, the Patient and Observer Scar Assessment Scale (POSAS) was introduced [9], which aimed at measuring the quality of scar tissue. The POSAS consists of an Observer and a Patient Scale and includes a comprehensive list of items, based on clinically relevant scar characteristics [10]. The observer scores six items: vascularization, pigmentation, thickness, surface roughness, pliability, and surface area. The patient scores six items: pain, pruritus, color, thickness, relief, and pliability (see “Appendix”) [10].

All included items are scored on the same polytomous 10-point scale, in which a score of 1 is given when the scar characteristic is comparable to ‘normal skin’ and a score of 10 reflects the ‘worst imaginable scar’. All items are summed to give a total scar score, and therefore, a higher score represents a poorer scar quality.

Studies that compared the POSAS with the widely used Vancouver Scar Scale revealed that the former was more reliable than the latter [9, 11]. At present, the POSAS is being used to evaluate the rehabilitation process in different types of injury [11–19] and has been advocated by many for scar assessment [2, 8, 11, 20].

Currently, all available scar assessment scales, including the POSAS, have been constructed and tested following principles of the classical test theory (CTT). However, modern test theories are considered superior to the CTT as it makes stronger assumptions and provides stronger findings. For this reason, the Rasch measurement model, one of the item response theory (IRT) models, is nowadays frequently applied in quality-of-life research [21–26]. Use of Rasch methodology involves a rigorous and extensive analysis of the data and provides additional psychometric information that cannot be obtained through the CTT approach. The data are tested for fit into the Rasch model, allowing for a detailed examination of the internal construct validity of the scale, including properties such as reliability and ordering of the categories. It also determines whether a scale is unidimensional, which is required to justify summation of scores and can linearly transform raw scores from their original scale to an interval scale to allow application of parametric statistics.

After several years of using the POSAS for burn scar evaluation, it became appropriate to subject this tool to modern test theories. For this reason, we decided to apply the Rasch model [27] to our data.

Materials and methods

Data collection

Observer and Patient Scale scores were collected from a large database including five single-center and two multicenter clinical trials involving burn scars. All scores were obtained by clinical evaluation of the scars. In these trials, the scars were usually scored by multiple observers and also on multiple time points. These scores were all included in the analysis because Rasch analyzes the measurement scale and not the scar outcomes of the different treatment strategies.

Data analysis

The POSAS data were transferred into the Rasch rating scale model using the Winsteps measurement software [28] (Winsteps^® Rasch Measurement Version 3.69.1, Chicago, Illinois, USA). The following analyses were performed:

(1)
Constructing the person–item map (Wright map);
(2)
Testing of (mis)fit between the data and the model;
(3)
Estimating the person and item reliability and separation coefficient;
(4)
Testing the ordering of the categories;
(5)
Analyzing the dimensionality;
(6)
Predictive validity;
(7)
Converting the logit scale to more meaningful units.

Person–item map

A map was constructed of the hierarchy of the person and item measures for both the Observer and Patient Scales to examine item and person performances. At the bottom of the map, the lower estimates of the person and item can be found, with increasing estimates represented higher up the map. On the left side, the patient performances are represented and on the right side the items. For a well-targeted measure, the mean location for the person should be around zero logits.

Test of (mis)fit to the model

To determine how well the empirical data fit the Rasch model, chi-square fit statistics were calculated. These fit statistics are the infit mean square (infit MNSQ) and the outfit mean square (outfit MNSQ). The infit MNSQ represents the information-weighted mean square residual difference between observed and expected responses. The infit statistics are sensitive to unexpected responses near the person’s ability level. The outfit statistic is the usual unweighted mean square residual and is more sensitive to outliers. The expected infit or outfit mean square values are 1.0. A mean square greater than 2.0 indicates more misinformation than information. Values should range between 0.5 and 1.7 for clinical observations [29]. High infit and outfit reflect underfit, which means lack of predictability of an item. Low infit and outfit reflect overfit, which means over-predictability of an item.

Reliability and separation statistics

In the Rasch model, reliability is estimated both for persons and for items. Person reliability in Winsteps is equivalent to the test reliability (Cronbach’s alpha) in the classical test theory. The person reliability reports how reproducible the person’s ability order is in this sample of persons for this set of items. The item reliability reports how reproducible the item’s difficulty order is for this set of items for this sample of persons. The higher the separation, the better the instrument is at differentiating person ability and item difficulty. Separation is measured on a continuous scale bounded by zero and infinity, which is an advantage over psychometric reliability which only ranges between zero and one. The person separation index can be used to calculate the number of distinct levels of scar quality (strata) that the items can distinguish [Strata = (4 × person separation index + 1)/3] [30, 31].

Category function

Category functioning is examined by analyzing category frequencies, mean measures, thresholds, and category fit statistics [32]. The items of the both the Observer and the Patient Scale have ten categories. The category frequencies indicate how many observers chose a particular response category. The recommended minimal number of responses per category is ten for stable rating scale–structure threshold parameter estimates [32]. The mean measures and the thresholds should increase when moving from lower to higher categories. Guidelines recommend that thresholds should increase by at least 1.4 logits, to show distinction between categories, but not more than 5 logits. When there are ordered categories, the category probability curves show that each category is the most probable category at some point on the latent variable. The partial credit model can be used when the rating scale is specific for each, which is not the case in the POSAS. Nevertheless, this model also allows you to examine different category functioning in individual items.

Dimensionality investigation

According to the Rasch methodology, when the data fit the Rasch model, the Rasch dimension is the only dimension in the data. Rasch factor analysis is a factor analysis of the residuals that remain after the linear Rasch measure has been extracted from the data set. A secondary dimension in the data must explain at least 2 items worth of variance: unless a component has the strength of at least 2 items, it may merely be due to an idiosyncratic item.

Predictive validity

All observers gave their overall opinion on the quality of the scar by assessing the item ‘overall opinion’. This item does not contribute to the total score and was shown to have a single ICC of 0.81 (95% CI: 0.75–0.86) [10]. It was used to calculate the Spearman correlation with the Observer Scale Rasch measure indicating the predictive validity of Observer Scale. The same method was performed with the patient’s overall opinion (single ICC: 0.84 (95% CI: 0.77–0.89)) on the scar and the Patient Scale Rasch measure.

Converting the logit scale to more meaningful units

The item measures in logits were rescaled to the user-friendly range of zero to 100 of the Observer and Patient Scale.

Results

The data collection resulted in the use of 1,629 Observer Scale scores and 1,427 Patient Scale scores taken from 707 patients of whom 393 were men and 314 were women. The mean age of the patients at the time of the measurement was 28 years (median 24 years and range 0.4–86 years). One hundred and eighty patients were under 6 years whereby the parents or caregiver completed the Patient Scale for the child. The measured scars had a mean age of 1.8 years (median 0.3 years and range 0.1–40 years).

The person–item maps

Figures 1 and 2 present the person–item maps. The items on the right side are located against the logit scale in the order of measurement. The default mean difficulty is set at zero. The Observer Scale map covers 11.4 logits (range −5.90; 5.51). In the Observer Scale, most persons are located at the middle of the map below the items. Mean scar quality Observer Scale measure is −1.47 (SD 1.22) logits, which is more than 1 logit below the average difficulty of the items (=local origin, which is set at 0). The Patient Scale map covers about 7.4 logits (range −3.43; 3.94). Mean scar quality Patient Scale measure is −0.52 (SD 0.89) logits, i.e., about 1/2 logit below the average difficulty of the items.

The item statistics table

Table 1 shows the items of the POSAS that are placed according to the hierarchy of the item difficulties. The measures are the item difficulty estimates. In the Observer Scale, the items thickness, surface roughness, and pigmentation have the values −0.05, −0.10, and −0.11 logits, respectively, which is nearly the same difficulty measure. The items vascularization and pliability have the values −0.56 and −0.58 logits, respectively, which is also nearly the same item difficulty measure. The inter-item separation of these items with the same difficulty and with surface area was larger than 0.15 logits, indicating no overlap between these items.

Table 1 Item statistics Observer Scale

Full size table

All the items of the Observer Scale, except surface area, have mean square infit or outfit values between 0.5 and 1.7. Surface area has large infit and outfit values of 2.02 and 1.94, respectively, indicating underfit. In the Patient Scale (Table 2), the items thickness, surface roughness, and pliability have inter-item separation less than 0.15 logits, which indicates overlap between these three items. The inter-item separation of the other items was larger than 0.15 logits, indicating no overlap between these items. All the items of the Patient Scale have mean square infit or outfit values between 0.5 and 1.7.

Table 2 Item statistics Patient Scale

Full size table