The use of prescription opioids for chronic pain treatment has increased worldwide (Helmerhorst et al., 2017; Kalkman et al., 2022). However, long-term use of these analgesics is considered a risk factor for prescription opioid abuse, increased risk of overdose, and other harms (Gallagher & Galvin, 2018; Volkow & McLellan, 2016). The importance of the risk assessment instruments in the prediction and prevention of prescription opioid use disorders has been pointed out by several authors (Ballantyne, 2017; Carballo et al., 2016; Lawrence et al., 2017). Opioid misuse prevalence is estimated between 21 and 29% in chronic pain population (Vowles et al., 2015), although data varies widely due to the heterogeneity of definitions and methodologies used to assess misuse in previous studies (Voon et al., 2017). Unfortunately, there are few adaptations of instruments for predicting opioid misuse in chronic pain population and the evidence of the accuracy of the adapted instruments is scarce (Smith et al., 2015; Voon et al., 2017).

The Prescription Opioid Misuse Index (POMI; Knisely et al., 2008) is one of the most used instruments to assess problematic prescription opioid use (Lawrence et al., 2017; Smith et al., 2017). It is a brief questionnaire made up of 6 binary items (yes/no), which identifies patients who use opioid medication inappropriately (score≥2). The initial validation of the instrument was made in a mixed sample of chronic pain patients (n=34) and opioid abusers (n=40) and the results showed good internal consistency (Cronbach’s alpha of 0.85). However, the small and homogeneous sample of chronic pain patients limited the generalization of the findings. For this reason, two recently validation studies have further tested the psychometric properties of the POMI in different samples of chronic pain patients (Delage et al., 2022; Laporte et al., 2022). The scale seems to perform better when used in pain clinics (Delage et al., 2022) rather than in general practice settings (Laporte et al., 2022). The authors also propose a shorter version of the POMI (POMI-5F; Delage et al., 2022), which also shows acceptable psychometric properties (Cronbach’s alpha of 0.71) and good external validity with the DSM-5 prescription opioid use disorder (POUD) criteria (r = 0.45; p < 0.001). Although both validations were conducted in a larger sample of chronic pain patients, there are still substantial gaps in the assessment of the diagnostic efficacy of the POMI. First, there are no studies investigating its measurement invariance. Since males are significantly more likely to report prescription opioid misuse (Silver & Hur, 2020), sex invariance should be tested for reliably interpreting differences between male and female patients. Second, both validations relied in classical test theory (CTT) approach. This implies that items of the POMI were analyzed using reliability parameters that are test- and sample-dependent. Modern methods, such as item response theory (IRT), estimate item parameters that are sample invariant and provide a richer description of the scale performance, at both item and test level (Hambleton et al., 1991). Moreover, the IRT approach is useful to estimate how precision of each item may vary across different levels of the construct (i.e., severity of the prescription opioid misuse).

To overcome these limitations, the aims of this study were (a) to further validate the POMI on a larger sample of chronic pain patients using both IRT and CTT approaches, and (b) to examine differential item functioning (DIF) across sex.

Methods

Participants

For this prospective study, participants were recruited from the Pain Care Unit (PCU) of the General University Hospital of Elche. All participants were diagnosed with chronic non-cancer pain and were currently under opioid treatment. Inclusion criteria were as follows: (1) being older than 18 years, (2) having been diagnosed with chronic non-cancer pain, and (3) being at least 3 months under opioid therapy for pain, which has been defined as long-term opioid treatment (Chou et al., 2015; Dowell et al., 2016). The exclusion criteria were (1) presenting mental severe pathologies (e.g., psychosis or dementia), (2) having a diagnosis of cancer, or (3) not being able to be correctly evaluated (e.g., due to advanced age or not accepting undergoing the complete assessment).

Although there are no general guidelines on the optimal sample size for IRT analyses, a minimum sample size of 200 was considered sufficient for estimating stable IRT model parameters (Edelen & Reeve, 2007; Morizot et al., 2007). A total of 267 chronic non-cancer pain patients were recruited for the study, of which 256 met the inclusion criteria. Thirty one questionnaires were discarded because they were not properly filled by the participants; thus, the final sample was made up with 225 patients.

Instruments

Patient and clinical characteristics were assessed (age, sex, type, and dosage in milligrams of opioids consumed and time in treatment with prescription opioids). Each opioid dose prescribed was converted to parenteral morphine equivalent dose (MED), using the American Pain Society guidelines (American Pain Society, 2016). If multiple opioids were used, the total MED per day was calculated by adding the morphine-equivalents doses of each opioid prescribed. Pain intensity and interference were assessed with the Spanish version of the Brief Pain Inventory (BPI; Badia et al., 2003), which has shown good psychometric properties (Cronbach’s alpha values between 0.87 and 0.89 in both subscales). Higher scores on the BPI reflect higher levels of pain intensity (score range: 0–40) and interference (score range: 0 to 70).

The 6 items of the POMI (Knisely et al., 2008) were used to assess prescription opioid misuse. The instrument was self-administered under the supervision of a trained psychologist. The POMI questionnaire was translated into Spanish following the International Test Commission guidelines (Muñiz et al., 2013). Two native Spanish speakers, working independently, translated the original version of the POMI into Spanish. The Spanish version was then translated back into English, without finding discrepancies between both versions of the scale (see Appendix A for the Spanish version of the POMI).

DSM-5 criteria for prescription opioid-use disorder (American Psychiatric Association, 2013) were assessed using the opioid dependence symptoms of the DSM-IV-TR checklist (excluding tolerance and withdrawal) and all the abuse symptoms of the DSM-IV-TR checklist, except legal problems. The craving criterion was assessed with the 5-item Craving Scale of Weiss (1995), which has been widely used for assessing craving of prescription opioid (M. Martel et al., 2014; M. O. Martel et al., 2016; McHugh et al., 2013; Wasan et al., 2012). A score equal to or greater than 1 was considered an indicative of the presence of craving. Finally, based on the number of criteria met, severity of the disorder was classified as mild (2–3 symptoms), moderate (4–5 symptoms), and severe (6 or more symptoms).

Procedure

Recruitment and assessment of the participants was conducted in the PCU, during regular consultation hours. Informed consent was obtained from all individuals included in the study. No participant was paid for taking part in it. Data were collected from September 2014 to January 2019 from all patients who came to the PCU. The instruments were individually applied in a single face-to-face session conducted by trained psychologists. To maintain confidentiality and anonymity of the data, identity information about patients was not included in the database. The study procedures were in accordance with the ethical standards of the Declaration of Helsinki and were approved by the Committee of Research and Ethics of the University Miguel Hernández de Elche and of the hospital.

Data Analysis

The psychometric properties and clinical utility of the POMI were assessed using CTT and IRT approaches. Data analyses were carried out with SPSS v.26 and the lavaan, mirt, ltm, eRm, difR, and pRoc packages of the R statistical software.

Item Scale Reliability

Descriptive analyses were performed for each one of the POMI items. McDonald’s categorical omega (ωcat) was calculated to determine the internal consistency of the scale. Values greater than or equal to 0.70 were considered acceptable (Nunnally & Bernstein, 1994). Item-total correlation corrected for item overlap and item-total correlation coefficients were computed. Item discrimination indices were classified as excellent (≥0.40), good (0.30–0.39), acceptable (0.20–0.29), and poor (<0.20) (Ebel & Frisbie, 1991). From the IRT approach, one- (Rasch) and two-parameter (2PL) logistic models for dichotomous item responses were also fitted to the data set. Discrimination (a) and difficulty (b) parameters were estimated for each item of the POMI. Discrimination parameters (with values ranging from 0 to 3) indicate how much each item can differentiate between patients with different levels of the latent trait (i.e., prescription opioid misuse). The a values are interpreted as very low (a<0.34), low (a=0.35–0.64), moderate (a=0.65–1.34), high (a=1.35–1.69), and very high (a≥1.70) discrimination (Kim & Baker, 2018). Difficulty or severity parameters (values approximately range from −3 to 3) indicate the severity level of the latent trait where the probability of endorsing the item (responding “yes”) is 50%. Larger positive difficulty values demonstrate that higher values of the latent trait (i.e., higher severity of prescription opioid misuse) are necessary to endorse the item.

In the Rasch model, the discrimination parameters are constrained to be equal for all items (a set to 1) and only difficulty varies between items. The 2PL is an extension of the Rasch model that allows discrimination parameters to differ per item. The likelihood-ratio test was used to compare the performance of both nested models. A significant result (p<0.05) indicates that the assumption of the parsimonious model (Rasch) does not hold, and the more complex model (2PL) fits the data better. Test information function, which sums the information (precision) of all items of the scale, was also calculated to examine the overall accuracy of the POMI along the latent trait continuum.

Differential Item Functioning (DIF)

Item invariance was tested to estimate whether males and females respond differently to each item of the POMI, after controlling for the latent trait level. Zumbo-Thomas (ZT) adjusted with Benjamini–Hochberg correction for multiple comparisons and Mantel–Haenszel (MH) χ2 tests were used to analyze DIF. Items were only considered sex biased if the ZT and MH statistics were statistically significant (p<0.05). DIF magnitude was also calculated using ZT change in R2R2) and MH delta difference (ΔMH). Both effect sizes were classified using the ETS criteria (Dorans & Holland, 1992), which designated items as A (nonsignificant or negligible DIF), B (moderate DIF), or C (large DIF).

Confirmatory Factor Analysis

The items were subjected to confirmatory factor analysis (CFA), in order to assess if the POMI measures a unidimensional construct as intended. CFA analysis was conducted applying the diagonally weighted least squares (DWLS), as DWLS estimator is suitable for dichotomous indicators. A good model fit was considered based on the following thresholds: Comparative Fit Index (CFI) and Tucker–Lewis index (TLI) values ≥0.96, and root mean square error of approximation (RMSEA) values ≤0.05 (Yu, 2002). The χ2 statistic and degrees of freedom (df) were also reported. Unidimensionality hypothesis was tested in the IRT model using the modified parallel analysis, with a significant result (p<0.05) indicating that the scale is not measuring a single trait.

Diagnostic Efficiency and Concurrent Validity

Receiver operating characteristic (ROC) curve analysis was used to establish the optimum cut-off score for detecting the presence of prescription opioid-use disorder (DSM-5 criteria). Two ROC curves were estimated, considering participants who met ≥2 criteria (mild to severe POUD) and those who met only ≥4 criteria (moderate to severe POUD) as the reference group. Delong’s test was used to test differences between the areas under both ROC curves. The Youden’s Index (sensitivity + specificity – 1) (Youden, 1950), and positive (PPV) and negative predictive values (NPV) were computed to determine the optimal cut-off point of the POMI. Binary logistic regressions were also performed to determine the concurrent validity of the scale. All analyses were performed at a confidence level of 95%.

Results

Participant Characteristics

Participant characteristics by sex are presented in Table 1. Mean age of the total sample was 56.52±13.19 years and 67.1% (n=151) were female. Average time in treatment with opioid medication was 27.40±30.54 months and the average morphine-equivalent dose was 37.92±48.92 milligrams per day. Mean pain intensity and interference scores were 24.19±6.66 and 43.86±15.01, respectively. No significant differences were found between males and females in demographic, clinical, or treatment characteristics (p>0.05).

Table 1 Patient characteristics by sex (n=225)

Of the participants, 28% (n=63) met criteria for moderate to severe DSM-5 prescription opioid-use disorder (≥4 symptoms). In addition, 36% (n=81) of the patients were classified as misusers using the original cut-off point of 2. Again, no significant differences were detected between both sexes (p>0.05).

Reliability and DIF by Sex

Categorical Mcdonald’s omega (ωcat) coefficient was 0.62 (CI 95%: 0.54–0.69), indicating a moderate but acceptable level of internal consistency. Table 2 shows the results of the reliability analyses using both CTT and IRT approaches.

Table 2 Item reliability results using classical test theory (CTT) and item response theory (Rasch and 2PL models) approaches

CTT discrimination indices showed that all items were within the good and excellent range (ITC=0.33–0.64), except for item 3 (“Do you ever need early refills for your pain medication?”) that had the lowest corrected item-total correlation (ITC=0.25).

Rasch and 2PL logistic IRT models were also fit to the POMI scale (see models’ fit in footnote of Table 2). The 2PL model showed a significant improved fit over the Rasch model (likelihood-ratio testdf=5= 29.66, p<0.001), indicating that the POMI items exhibit varying levels of discrimination power. The 2PL discrimination parameters also identified the “early refill” item as weak (a=0.56), whereas the rest of the items showed moderate to very high discrimination power (a=0.97–3.61).

No items were flagged for DIF by sex using both Mantel–Haenszel and Zumbo–Thomas procedures (Table 3). These results indicate that female and male patients at the same level of prescription opioid misuse exhibit similar probability of endorsing each item of the scale.

Table 3 Differential item functioning of the Prescription Opioid Misuse Index by sex

According to the test information function, the optimal accuracy of the POMI was around theta values ranging 0.5 and 1.5. When the latent trait values are exceeded or below this range, the information function declines (see Fig. 1). This suggests that the POMI provides greater precision for patients whose misuse levels are around or moderately above the mean of the latent trait (theta Θ value=0), which was 1.27±1.34.

Fig. 1
figure 1

Test information function of the Prescription Opioid Misuse Index (POMI). The test information curve indicates measurement precision of the scale across the range of the latent variable (theta Θ values of the x-axis)

Confirmatory Factor Analyses

A CFA analysis was carried out on the set of the 6 items, in order to assess if the POMI measures an unidimensional construct. The results indicated that the one-factor model provided an excellent fit (χ2(df)DWLS=10.345(9), p=0.323; RMSEA=0.026, CFI=0.994, TLI=0.991), supporting the unidimensional structure of the questionnaire and the use of a single total score. Using the IRT approach, the modified parallel analysis also confirmed that the POMI measures a single trait (second eigenvalues averaged across 100 Monte Carlo samples=0.823, p=0.347).

Diagnostic Efficiency and Concurrent Validity

ROC analysis yielded an area under the curve (AUC) of 0.72 (p<0.001; CI 95%: 0.65–0.78) for the DSM-5 POUD criteria, when including mild to severe symptoms. Regarding predicting only DSM-5 moderate to severe POUD, ROC analysis generated similar results (AUC=0.78; p<0.001; CI 95%: 0.72–0.85). There was no significant improvement in the overall screening accuracy of the POMI when comparing both AUC (Delong’s test after 10,000 bootstrapping iterations: D=−1.44, p=0.149).

As shown in Table 4, the cut-off score proposed in the original validation (≥2) was the one with the highest combination of sensitivity and specificity for diagnosing opioid use disorder, notwithstanding the classification used (DSM-5 mild to severe or DMS-5 moderated to severe). Figure 2 shows the diagnostic concordance between the POMI and the reference standard DSM-5.

Table 4 Diagnostic accuracy of the POMI in screening DSM-5 prescription opioid use disorder (POUD)
Fig. 2
figure 2

Diagnostic performance of the POMI following the Standards for Reporting of Diagnostic Accuracy (STARD) guidelines

Finally, the results of logistic regressions showed odds ratios of 3.804 (CI 95%:2.377–6.089, p<0.001) for DSM-5 mild to severe POUD symptoms and 7.824 (CI 95%:4.079–15.004, p<0.001) for moderate to severe symptoms, supporting the concurrent validity of the POMI (score ≥2) with the DSM-5 criteria.

Discussion

Due to the methodological limitations of the previous validations of the POMI, the aims of this study were to further validate the instrument on a larger sample of chronic pain patients using both IRT and CTT approaches, and to test differential item functioning (DIF) across sex.

The results obtained show that the POMI is a valid and reliable single-factor instrument for use in the assessment of prescription opioid misuse. Considering the length of the instrument, the POMI demonstrated good internal consistency (ωcat =0.62) and high concurrent validity with DSM-5 opioid-use disorder criteria.

Around 39% of the participants responded “yes” to the item assessing the need of early refills of pain medication, which showed a poor discrimination index. This result coincides with previous findings where early refill was present in the same percentage in chronic pain patients with and without prescription drug use disorder (Meltzer et al., 2012) and it is not considered a direct measure of misuse (Lasser et al., 2016). However, because early refills could increase the risk of prescription opioid misuse (Khalid et al., 2015; Lange et al., 2015), this item was not removed from the instrument. Since risk of prescription opioid misuse is associated with sex, we also tested measurement invariance of the POMI between male and female patients. The DIF effects found indicate that none of the six items of the scale are biased for sex and, therefore, all of them represent the latent structure of prescription opioid misuse equally in males and females. In terms of research implications, these findings provide evidence that the POMI can be used to ensure valid comparisons between both sexes. Moreover, the scale could be used in clinical settings as it seems to provide a precise description of opioid misuse behaviors in both groups of chronic pain patients. In this sense, the POMI could meet the needs of daily clinical practice since it allows clinicians to monitor opioid use with a self-reported measure that can be applied indistinctly in male and female patients.

Concerning the diagnostic efficiency of the POMI, a cut-off point of 2 seems to maximize the accuracy of the instrument as a screening tool. When using this score, the POMI seems to perform well for detecting DSM-5 POUD, regardless the severity of the disorder. Nonetheless, the POMI showed greater NPV values (86.8%) when screening moderate to severe POUD. Higher NPV values are strongly recommended when assessing conditions that can benefit from an early identification and treatment (Trevethan, 2017), such as prescription opioid misuse (Ballantyne, 2017; Carballo et al., 2016; Dowell et al., 2016). Thus, the POMI seems to be more suitable for detecting higher levels of POUD severity, and less precise when including mild severity levels.

Because of its characteristics (short-length and dichotomous answers), the POMI becomes a useful and clinically feasible first-step screening tool. In this sense, when a positive result is obtained, clinicians can subject the patient to a deeper assessment, in order to correctly identify the true positives (Lalkhen & McCluskey, 2008). Moreover, the scale appears to maintain its psychometric properties when used in different populations, which means that can be used with both English and Spanish-speaking chronic pain patients.

To our knowledge, this is the first validation of the POMI done using IRT approach. The use of a bigger sample of chronic pain population is also a strength of this study. Moreover, the study was performed in a real-life scenario for chronic pain patients under opioid therapy, which also helps to demonstrate the usefulness of the scale in daily clinical practice. However, there are some limitations that should be noted. First, the predictive validity of the POMI could not be assessed. Second, the representativeness of the sample could be improved by using a random selection method. Nevertheless, the estimated prevalence of opioid-use disorder found in our study was nearer to other reviews (Boscarino et al., 2015) than the one observed in the previous validations.

The findings of this study could be important to the field of long-term opioid therapy for chronic pain, where there is yet a lack of scientific evidence on risk assessment instruments. Further studies are needed to assess the effectiveness of its use in the reduction of misuse prescription opioids during long-term therapy.