INTRODUCTION

Accurately diagnosing opioid misuse is important in the setting of the current opioid epidemic. In 2017, more than 70,200 people in the USA died from drug overdose with a 5.9-fold increase in the total number of deaths involving opioids since 1999.1 Heroin use is strongly associated with prior history of opioid misuse,2 and approximately one-quarter of patients on chronic opioid therapy (COT) misuse opioids.3 Urine drug testing (UDT) is used to monitor patients on COT.4,5,6,7,8,9,10,11,12,13,14 UDT can assess medication adherence and detect aberrant behavior, such as diversion or non-prescribed drug use. However, the clinical utility and providers’ ability to correctly interpret UDT is of growing concern.12,13,14,15,16,17,18,19

Most providers have a limited understanding of opioid metabolism and cross-reactivity of immunoassays and are not proficient in the interpretation of UDT results.20,21,22,23,24,25,26 Family physicians and pediatricians performed poorly on surveys designed to assess knowledge of UDT by immunoassay.22, 24 Most emergency physicians were unable to name the drugs tested at their hospital.23 Two surveys of residents revealed that they were confident but inaccurate when interpreting UDT.20,26 Physicians with experience in COT fared no better, rarely answering more than half of the questions correctly in a survey assessing interpretive knowledge of toxicology immunoassays.21 However, in one case, a successful educational intervention was implemented to improve residents’ understanding of UDT, indicating that medical professionals would benefit from more education on results interpretation.27

Clinical decisions based on inaccurate result interpretations can have serious consequences for patients.28 Providers could erroneously suspect aberrant behavior, potentially resulting in discontinuation of medications. Moreover, illicit drug use, undisclosed prescription opioid use, or simulated compliance could go undetected if the results are incorrectly interpreted. Some recent guidelines and research in the field of pain management recommend replacing traditional immunoassay screens with more definitive testing such as liquid chromatography-tandem mass spectrometry (LC-MS/MS).12, 29,30,31,32,33,34 However, provider ability to interpret UDT by LC-MS/MS results has not been previously studied.

This retrospective study examines provider ability to interpret and document the results of LC-MS/MS UDT for patients on COT, assesses the frequency of communicating UDT results to the patient, and investigates the potential clinical consequences of provider interpretations.

METHODS

Study Setting and Sample Selection

This retrospective chart review was approved by the Partners IRB committee and conducted at Brigham and Women’s Hospital (BWH), a 749-bed tertiary care hospital in Boston. Eligible patients included adult patients (≥ 18 years old) on COT for whom UDT by LC-MS/MS was ordered in 5 ambulatory clinics between August 2017 and February 2018, which included 3 primary care clinics, 1 oncology/palliative care clinic, and 1 pain management clinic. Among the three primary care locations, one (a community clinic) had a higher prevalence of patients with substance use disorders (SUD), so this location was analyzed separately.

From this pool of UDTs, we reviewed 183 randomly selected results to achieve our study design of 160 cases (80 aberrant and 80 non-aberrant). We excluded 23 cases. Reasons included documented direct communication between the provider and the laboratory for interpretation assistance (7 cases), patients not receiving COT (10 cases), and restricted patient information in the electronic health record (EHR) (2 cases). If multiple UDT specimens were submitted prior to the next follow-up appointment, we randomly selected one and excluded the rest (4 cases). We defined the next appointment as the nearest occurring encounter from the date of the randomly selected UDT.

Measures

The primary outcome was concordance between provider and laboratory interpretation of UDTs. Secondary outcomes include provider documentation of UDT results, acknowledgement of results in the EHR, communication of results to the patient, and rate of prescription refills during follow-up appointments.

Provider level covariates included age, gender, education/degree, clinical training (residency and/or fellowship), years in practice, and practice location. We calculated age by subtracting the UDT collection date from the provider’s date of birth obtained from the EHR. We obtained gender, education/degree, clinical training, and practice location from cross-referencing the provider’s BWH online profile, Massachusetts Board of Registration database, and LinkedIn profiles. We calculated years in practice by subtracting the UDT collection date from the year of graduation from residency or fellowship (MDs), nurse practitioner school (NPs), or pharmacy school (Pharm Ds).

Patient characteristics included age, race, SUD history, and type of pain (i.e., cancer, non-cancer, or both).

Laboratory Information and Result Interpretation

The BWH chemistry laboratory performs UDT by immunoassay as well as LC-MS/MS. However, LC-MS/MS is considered the gold standard due to its higher sensitivity and specificity to assess patient medication compliance and to detect aberrant behavior.12,29 BWH performs approximately 3500 UDT panels by LC-MS/MS annually. The panel detects opioids, benzodiazepines, and stimulants but not tetrahydrocannabinol. Please see Table A-1 online for a complete list of drugs and metabolites included in the panel.

One of two laboratory directors (A.P., S.M.) interpreted each LC-MS/MS UDT result in conjunction with prescribed medications (as documented in the EHR) at the time of sample collection. Both laboratory directors are experts in clinical chemistry and toxicology with more than 25 combined years of laboratory experience. Results were categorized as either “aberrant” or “non-aberrant,” by comparing the metabolites to the medications prescribed. Results were categorized as “aberrant” if they showed evidence of one or more of the following: illicit drug use, simulated compliance, not taking a prescribed drug(s), or taking a drug(s) not prescribed. The interpretation was classified as “non-aberrant” if results were consistent with the prescribed medications. For patients taking opioids on a PRN basis, both the presence and absence of the drug were considered non-aberrant. See Table A-2 online for definitions of aberrant and non-aberrant subcategories.

Chart Review

Medical records were reviewed by one of three clinicians (I.C., S.M., J.S.) using a standardized review instrument. To assess reliability of the expert laboratory interpretations and medical record review, 10% of the cases were re-reviewed by a second physician or laboratory director. Please refer to Table A-3 online for the definitions of key terms.

Statistical Analysis

Cohen’s kappa statistical test was performed to assess agreement between two reviewers. The Fleiss kappa test, which assesses agreement between more than two reviewers, was used to assess concordance of medical record review.35

We investigated associations between categorical variables using Pearson’s chi-squared (χ2) test and Fisher’s exact test when expected cell counts were < 5. We used Student’s t test for 2 group comparisons of normally distributed numeric variables and Mann-Whitney U (2 groups) and Kruskal-Wallis (> 2 groups) to compare non-normally distributed outcome variables. The association between number of UDTs ordered and age and years in practice was investigated using Spearman rank correlation. Univariate and multivariate analyses of provider characteristics against categorical outcome variables were performed using logistic regression models. Clinical training was excluded from the multivariate models due to reduced number of observations (i.e., not applicable for NPs and PharmDs) and its strong correlation with location. All calculations were performed using the statistical software package STATA version SE15.

RESULTS

Demographics

Among the 160 cases included for analysis, there were 136 unique patients and 39 providers who ordered UDTs. Most COT patients were white and were treated for non-cancer pain (Table 1). 42.6% of patients had a diagnosis of SUD. Most providers were men and physicians (Table 2). The most common residency and fellowship completed were internal medicine (59.0%) and pain medicine (12.8%), respectively. The median number of years of clinical experience was 10 (IQR 1.75–18.25). Half of the providers practice in a primary care setting.

Table 1 Patient Characteristics
Table 2 Provider Characteristics

Urine Drug Testing Orders

The median number of UDTs ordered per patient/year was 2 (IQR 1–6). The number of UDTs ordered varied significantly based on the provider’s gender, degree/education, clinical training, and practice location (Table 3). Women (4 [IQR 4–10]), pharmacists (9 [IQR 3–13]), physicians who completed internal medicine training (3 [IQR 1–9]), and providers practicing in the community clinic (12 [IQR 6–14]) ordered the most UDTs per patient/year in their respective subcategories.

Table 3 Number of UDTs Ordered per Patient/Year by Provider Characteristics

Reviewer Agreement

Laboratory directors achieved perfect agreement for toxicology interpretations. For medical record review, reviewers achieved perfect agreement (K = 1.00) on the presence of COT and documentation of result communication with the patient. Reviewers achieved substantial agreement on provider concordance with the laboratory (K = 0.601), documentation of an interpretation (K = 0.741), history of SUD (K = 0.655), and result acknowledgment in the EHR (K = 0.737). Reviewers achieved fair agreement on prescription refills (K = 0.256).

Laboratory Test Result Interpretations

Among aberrant cases (n = 80), 37 (46%) contained illicit substances, of which the most common were cocaine (70%), fentanyl (27%), and morphine suggestive of heroin use (16%). Twenty-two (28%) were interpreted as simulated compliance, of which 19 (86%) had urine creatinine < 20 mg/dL (suggestive of urine dilution) and 3 (14%) had abnormally high levels of the parent drug (suggestive of dropping the drug into the urine sample). Eighteen (23%) samples lacked evidence that patients were taking their prescribed medication(s). Thirty-four (43%) were positive for medication(s) that patients were not prescribed. Among clinic locations, oncology (52 (65%)) and community clinic (42 (53%)) have the highest percentage of aberrant results (Fig. 1).

Figure 1
figure 1

Breakdown of aberrant results by location. The percentage of aberrant cases for community clinic (dark blue bars), primary care (medium dark blue bars), oncology (medium light blue bars), and pain management (light blue bars) is depicted. The overall aberrant cases are broken down by location into 4 categories (simulated compliance, illicit use, not taking prescribed medication, taking medication not prescribed) and the percentage of cases in each category is depicted. Some cases may have results in more than one category. Results that differ significantly by location are shown with a * (p < 0.05) and ** (p < 0.001).

Provider Interpretation, Documentation, and Acknowledgement of Results

Provider interpretations of UDT were documented for 88/160 (55%) cases. Among the documented interpretations (n = 88), 63 (72%) provider interpretations were concordant with laboratory interpretations and 25 (28%) were discordant. Results were more likely to be aberrant when interpretations were discordant compared with when interpretations were concordant (23/25 [92%] vs. 27/63 [43%], p < 0.001). Simulated compliance was more frequently seen in discordant cases than concordant cases (28% vs. 6%, p = 0.006).

Providers electronically acknowledged 146/160 (91%) results in the EHR. Of those not acknowledged (n = 14), 6 (43%) were aberrant and 8 (57%) were non-aberrant. Thirty-six of the160 (23%) cases had documented communication of the results to the patient. Providers documented communication of results more often when results were aberrant versus non-aberrant (33/80 [41%] vs. 3/80 [4%], p < 0.001). When there was no documentation of result communication to the patient (n = 124), 47 (38%) were aberrant and 77 (62%) were non-aberrant.

Follow-up and Prescription Refills

One hundred fifty-four of the 160 (96%) cases had a documented follow-up visit in the EHR. Among non-aberrant cases with documented follow-up, 74/76 (97%) opioid prescriptions were refilled. Among aberrant cases with documented follow-up, 67/78 (86%) prescriptions were refilled. Although not statistically significant (p = 0.51), 57/63 (90%) prescriptions were refilled for concordant interpretations and 25/25 (100%) prescriptions were refilled for discordant interpretations. If there was no documentation of an interpretation, 59/72 (82%) prescriptions were refilled.

Association between Outcomes and Provider Characteristics

Interpretation concordance and prescription refills were not significantly associated with provider characteristics in both the univariate (Table A-4 online) and multivariate analyses (Table A-5 online). In the multivariate analyses (Table 4), odds of documenting UDT results increase for each year out of training (OR 1.15 [95% CI 1.01–1.31]) but decrease for each year of provider age (OR 0.84 [95% CI 0.71–0.99]). Odds of documentation increase for providers practicing in the oncology clinic (OR 18.61 [95% CI 4.54–76.30]) and community clinic (OR 8.22 [95% CI 1.52–44.16]). Compared with male providers, female providers are less likely to document UDT results (OR 0.16 [95% CI 0.04–0.66]) and communicate results to patients (OR 0.13 [95% CI 0.03–0.58]).

Table 4 Multivariate Analysis of Acknowledgement, Documentation, and Communication with Patient of UDT Results by Provider Characteristics

DISCUSSION

Our study demonstrated that providers across medical specialties have difficulty on correctly interpreting UDT by LC-MS/MS. They also infrequently document their interpretation and rarely document communication of results with patients. Furthermore, prescription refills frequently occurred when no documentation of interpretation was present and when provider interpretation was discordant with laboratory interpretation.

Strengths of this study include examining provider UDT interpretation accuracy in a live practice environment, using the new gold standard for opioid compliance monitoring (i.e., LC-MS/MS), and defining four non-mutually exclusive categories of aberrant UDT results similar to categories previously described in the literature.4,8,13 Compared with testing clinicians on simulated cases, studying provider behavior in a live environment allows us to capture actual behavior following an interpretation (e.g., prescription refills, results communication). The high sensitivity and specificity of LC-MS/MS allows the laboratory and clinicians to interpret the results with fewer concerns of false-positive or false-negative results.12,29,30,31,32,33 Moreover, using well-described categories of aberrancy allowed us to compare our findings with preexisting literature. Our prevalence of aberrancy subtypes resembled prior literature with some differences (e.g., similarly high rates of illicit drug use but our study had higher rates of simulated compliance).8,13,36

Providers incorrectly interpreted UDT approximately one-quarter of the time across all disciplines. The prevalence of diagnostic error of opioid misuse and compliance is unknown, but diagnostic error in the ambulatory setting is estimated to occur in 5% of encounters.37 We found that providers had the most inaccuracies interpreting aberrant results, which is not surprising given the complexity of the opioid metabolic pathway and ability to recognize more nuanced aspects of interpretation (e.g., urine dilution). The number of factors that influence drug metabolism (e.g., pharmacogenetics) and the time window of detection of drugs and/or metabolites in urine can also complicate result interpretation.29

Providers were significantly more likely to misinterpret cases of simulated compliance. Fourteen percent of aberrant results were consistent with simulated compliance, of which 86% was due to concerns for dilution. The Substance Abuse and Mental Health Services Administration (SAMSHA) considers a creatinine < 20 mg/dL suspicious for urine dilution,38 which can occur either by ingesting large amounts of liquid prior to providing the specimen or adding water directly to their specimen to dilute the urine to obscure illicit drug use. Only 3 patients had evidence of simulated compliance by dropping a drug directly into their urine. Although aberrancy by simulated compliance occurs less frequently than illicit drug use, this type of aberrancy is often overlooked by providers, thus warrants increased vigilance.

Documentation of UDT interpretations was infrequent, which is consistent with prior studies.25,39 Although almost all providers acknowledged the results in the EHR by clicking “marked as reviewed,” documentation of UDT interpretations occurred only 55% of the time. This suggests that providers may be acknowledging results to clear their queues without critically reviewing them, or they are unsure how to interpret the results and thus avoid documenting their interpretations. Furthermore, only 23% of providers documented communication of results to the patient. Although providers may have verbally communicated results to the patient without documenting the discussion, this practice exposes providers to liability since opioid prescribing is considered a high-risk endeavor.40

The number of UDT ordered per patient/year varied by provider characteristics and clinic location. Although some variation may be explained by differences among each clinic’s patient population (e.g., SUD prevalence), variations were also associated with the provider’s training background. This is not surprising since consensus recommendations are clinical society-based.41,42,43 Guidelines vary from no specific recommendations about UDT ordering frequency to repeating UDT at least 8 times a year.6,12,14,29,44,45,46

Inappropriate prescribing in the setting of UDT misinterpretation is seldom recognized as a potential cause of harm and risks contributing to the opioid epidemic. Nonetheless, the decision to change opioid prescribing practices is complex and hinges upon multiple patient variables, even among aberrant UDT results.39,47 In our study, 85% of aberrant cases and 100% of discordant interpretations had a refill at the subsequent follow-up appointment. Without additional information, we cannot definitively conclude whether these refills were truly inappropriate. However, our findings do suggest that there is a potential for inappropriate prescribing with such misinterpretations.

The lack of association between provider characteristics and interpretation concordance or prescription refills is likely due to lack of power. The association of practice location with increased odds of communication of results and documentation may be due to higher prevalence of illicit drug use in the oncology and community clinics. Although this finding was statistically significant, the wide confidence intervals were due to small sample sizes in each clinic. The decreased odds of documenting UDT results among female providers and the inverse relationship between provider age and years of practice are harder to explain. Future studies examining provider attitudes towards UDT documentation and result communication would help clarify this finding.

Providers might benefit from expert assistance when interpreting UDT results. There is evidence to suggest a collective intelligence approach is associated with higher diagnostic accuracy.48 One study implemented a pharmacist e-consult UDT interpretation service that could help guide provider follow-up actions.49 Other strategies may include involving pathologists and laboratory services as part of the diagnostic team.50,50,52

This study has several limitations. First, inter-observer variability during chart reviews may introduce information bias. However, a standardized template was utilized for both medical record review and laboratory interpretations and the Cohen and Fleiss Kappa statistics suggest that interpretations and reviews were consistent among reviewers. Second, by randomly selecting cases, providers who routinely ordered UDT more frequently or ordered additional UDT specifically to address aberrancy were more likely to be included. Because these providers may be more knowledgeable at interpreting results, inclusion of these providers conservatively biases results towards greater interpretation knowledge when compared with those who order UDT less frequently, thus strengthening the generalizability of our conclusions. Third, we were unable to exclude cases when providers contacted the lab and no documentation of such communication was charted. Such communication occurs infrequently and would also make our results more conservative, underestimating the overall problem of provider misinterpretations. Fourth, laboratory directors based their interpretations on the results and medications listed in the EHR. They did not have access to the Massachusetts Prescription Monitoring Program fill data because review of this database is not permitted for research studies. Such access could have provided a more comprehensive account of prescribed and dispensed controlled substances. Fifth, we assumed that patients with PRN prescriptions were non-aberrant if drugs and/or metabolites were not detected. However, providers may have been expecting these patients to have the drug(s) and/or metabolite(s) in the urine, leading to erroneous classification as non-aberrant. Finally, providers may have copied and pasted previous notes, resulting in an overestimation of documentation.

CONCLUSION

This is the first study to evaluate provider interpretation and documentation of definitive UDT by LC-MS/MS. Erroneous provider interpretation of UDT results, infrequent documentation of result interpretation, lack of documented communication of results to patients, and prescription refills despite discordant interpretation are common. Expert assistance with urine toxicology interpretations may be needed to improve provider accuracy when interpreting toxicology results.