We recruited patients with chronic low back pain (CBP) lasting for at least 6 months, who were on opioid therapy for at least 3 months (opioid group). Controls were patients with CBP without opioid usage history (non-opioid group), who were matched to the opioid group in terms of age, sex, pain intensity, and pain duration. All participants were older than 18 years, fluent English speakers, and were able to understand instructions as well as questionnaires.
Participants were excluded if they (1) were treated with a spinal cord stimulator; (2) reported rheumatoid arthritis, ankylosing spondylitis, acute vertebral fractures, fibromyalgia, low back/spine oncologic history, and other comorbid neurological disorders including major depression and psychiatric disorder requiring treatment; (3) were involved in litigation regarding their back pain, having a disability claim, or receiving workman’s compensation; (4) reported a significant other medical disease such as uncontrolled hypertension, unstable diabetes mellitus, renal insufficiency, congestive heart failure, coronary or peripheral vascular disease, chronic obstructive lung disease, or malignancy; (5) were pregnant during the study.
Overall, 29 patients with CBP with opioid (opioid group) and 29 without opioid (non-opioid group) were included in this study. Power analysis had shown that 25 individuals in each group were able to detect an effect size of 1.5 between two groups with statistical power 0.8, assuming alpha was 0.01 (two-tailed). This study was performed in accordance with the Helsinki Declaration of 1964 and its later amendments. All participants understood the purpose of the study, procedures to be completed, possible benefits and potential risks, and signed a consent form prior to any study-related activities. The protocols were approved by Northwestern University IRB (STU00207384, STU00205398).
Psychosocial, Sensorimotor, and Opioid Measures
All participants completed a demographic survey asking about their age, gender, smoking status, height, weight, history of surgery, history of back surgery, pain intensity, and pain duration. Pain intensity was measured by the Numerical Rating Scale (NRS) . Opioid users reported details of prescribed opioids and the Current Opioid Misuse Measure (COMM), which estimates a risk of opioid misuse. Dosage of prescribed opioids was converted to the morphine milligram equivalent (MME) using standard conversion factors generated by the CDC .
Multidimensional assessment for patients consisted of 64 measures in 10 distinct instruments: (1) Oswestry Disability Index (ODI) , (2) PainDETECT , (3) McGill Pain Questionnaire-short form (MPQ) , (4) Pain Catastrophizing Scale (PCS) , (5) Beck Depression Inventory (BDI) , (6) Positive and Negative Affect Schedule (PANAS) , (7) Patient-Reported Outcomes Measurement Information System (PROMIS) profile test , (8) 12-item Short Form (SF12) health survey , (9) Multidimensional Assessment of Interoceptive Awareness (MAIA) , and (10) NIH Toolbox  (containing four domains of sensory [38,39,40], emotion , cognition , and motor function). All questionnaires were collected electronically using REDCap, a browser-based software and workflow methodology for designing clinical and translational research databases. Details of all collected measures are listed below.
An 11-point numerical rating scale, where “0” corresponds to no pain and “10” indicates worst pain possible (or imaginable) .
A self-reported questionnaire containing 17 items, which estimate the risk of opioid misuse .
A ten-item questionnaire, measuring functional disability in daily living .
A seven-item screening questionnaire for neuropathic pain .
A 15-item measure of affective and sensory components of pain .
A 13-item assessment for thoughts or feelings related to past pain experiences, yielding three subscale scores assessing rumination, magnification, and helplessness .
A 21-item instrument to measure the severity of depression .
A composite questionnaire containing two mood scales rated on ten items in each scale; one measures negative affect and the other measures positive affect .
This profile test contains 57 items for seven core domains of health-related quality of life (QOL); pain interference, fatigue, depression sadness, anxiety fear, sleep disturbance, physical function, and social activity .
A composite questionnaire of eight sections measuring QOL; general health, physical role function, physical function, social function, emotional role function, mental health, vitality, and bodily pain .
A 32-item questionnaire measuring seven personal traits of the interoceptive awareness; not distracting, not worrying, noticing, attention regulation, emotional regulation, self-regulation, body listening, and trusting .
- NIH Toolbox:
In this study, the sensory part of the NIH Toolbox is composed of pain interference , regional taste quinine and salt , and odor identification . Emotional assessment of the NIH Toolbox contains ten measures of negative emotion: anger-physical aggression, anger-affect, anger-hostility, sadness, fear-affect, fear-somatic arousal, perceived hostility, perceived stress, perceived rejection, and loneliness; and seven measures of positive emotion: general life satisfaction, meaning and purpose, positive affect, friendship, emotional support, self-efficacy, and instrumental support . The NIH Toolbox also assesses six cognitive functions: picture vocabulary, picture sequence memory, flanker inhibitory control and attention, dimensional change card sort, pattern comparison processing speed, list sorting working memory . The 2-min walk endurance and nine-hole pegboard dexterity dominant and non-dominant were assessed as motor functions of the NIH Toolbox  in this study.
Statistical Analyses of Psychosocial and Sensorimotor Measures
Demographic measures were compared between opioid and non-opioid group using two-tailed unpaired t tests for continuous data or chi-square tests for binary data. A log transformation was used for a comparison of pain duration to approximate to a normal distribution. In case of missing within-questionnaire items, these were replaced with the average of the remaining within-questionnaire scores, provided that the number of unanswered questions was less than 30% of all items in each scale . If more than 30% of the items were blank, subjects were excluded from statistical tests relevant to the given scale. In our dataset, because an individual in the opioid group did not answer a large portion of the PROMIS, the individual was excluded from any statistical tests relevant to the PROMIS.
Differences of all 64 measures in the multidimensional assessment between the groups were assessed using a multivariable regression analysis, which adjusted for age, sex, pain intensity, log-transformed pain duration, and history of back surgery. Next, Pearson’s correlation of each measure to pain intensity was analyzed in each group. Then, we investigated group difference of the correlation coefficient, where Fisher’s z transformation was performed on the correlation coefficients and statistical significance was examined under standard normal distribution. We employed two different alpha thresholds for statistical significance in a multiple comparison manner. The first was a Bonferroni’s corrected p value in a small group, in which the thresholds depended on the number of measures in each small group. For example, because the PROMIS contains seven measures, the alpha was 0.05/7 (≈ 0.007). Another threshold was a Bonferroni’s corrected p value for the entire assessment of 64 measures (0.05/64 ≈ 0.0008). Effect sizes of the difference between the two means of the two groups were assessed using Cohen’s d in MATLAB 2016a (The MathWorks, Natick) and JMP Pro version 13.2 (SAS Institute, Cary, NC).
To reduce the dimensionality of the assessments, a principal component analysis using a varimax rotation was performed on the statistically significant measures identified at the Bonferroni’s corrected p value (p < 0.0008) in each group. Two criteria for component selection were used: (1) an eigenvalue of more than 2, and (2) the percentage of explained variance above 10%. After the principal components were selected, component scores corresponding to each patient with CBP were created on each principal component and standardized along subject column to reflect a z score.
Associations of the identified principal components with pain intensity and opioid-related measures (MME and COMM) were examined using a multivariable regression analysis, adjusted for age, sex, log-transformed pain duration, and history of back pain in each group. In the analysis, log-transformation was performed on the MME to approximate to a normal distribution.
Blood Concentration of Opioids
Blood samples were drawn for each participant before the MRI scan and psychosocial and sensorimotor assessments. Plasma concentrations of opioids were measured using high-performance liquid chromatography tandem mass spectrometry (HPLC–MS/MS), with the mass spectrometer operating in positive ion and multiple reaction monitoring acquisition mode, after sample preparation by solid-phase extraction or simplified liquid extraction. All samples were run in duplicate and the average concentration used.
The opioid concentrations were then converted to morphine equivalent units (parenteral morphine) using similar method described above , and were log-transformed and compared to pain intensity and the identified principal components. Five individuals were excluded from this analysis as the blood sample volumes collected were insufficient.
Scanning Parameters of Magnetic Resonance Imaging
Participants were scanned on a 3-T Siemens Prisma. High-resolution T1-weighted brain images were acquired using integrated parallel imaging techniques (PAT; GRAPPA) representing receiver coil-based data acceleration methods. The acquisition parameters were isometric voxel size = 1 × 1 × 1 mm, TR = 2300 ms, TE = 2.40 ms, flip angle = 9°, acceleration factor of 2, base resolution 256, slices = 176, and field of view (FoV) = 256 mm. The encoding directions were from anterior to posterior, and a time of acquisition of 5 min 21 s.
Voxel-Based Morphometry of Cortical Gray Matter
Intracranial volume normalized for individual head size was estimated with SIENAX , part of FSL . Gray matter density was examined using voxel-based morphometry from FSL-VBM , an optimized voxel-based morphometry (VBM) analysis  in FSL . Statistically significant voxel clusters were identified using threshold-free cluster enhancement (TFCE)  with family-wise error corrected p value less than 0.05. Neurosynth term-based reverse inference [51, 52] was then used to decode the identified brain region (see details in the supplementary methods).
Gray matter density in the identified cluster was averaged for each participant, then compared between the opioid and non-opioid groups using a multivariable regression analysis which adjusted for age, sex, pain intensity, log-transformed pain duration, history of back surgery, and intracranial volume. Next, we examined the associations of the averaged gray matter density with pain intensity, the identified principal components, log-transformed MME, or COMM, again adjusting for age, sex, and intracranial volume. We also performed volumetric analyses of subcortical nuclei and hippocampal subfields (see supplementary material).
Neurosynth Term-Based Reverse Inference
This meta-analytical tool contains a database with activation coordinates for a total of 14,371 functional MRI studies, paired with their associated cognitive and anatomical terms (http://neurosynth.org/decode/) [51, 52]. The decoder takes in the voxel-wise representation of the region, cross-references it with the full database, and returns a list of terms and their correlation values to the region. Here, we retrieved the top 50 non-anatomical terms (of about 1700 terms generated) showing the greatest correlation.
Volumetric Analysis of Subcortical Nuclei
Volume extraction of subcortical nuclei from T1-weighted images was performed through FSL’s integrated registration and segmentation tool (FIRST) , which computed volumes of all subcortical nuclei including thalamus, caudate, putamen, pallidum, hippocampus, amygdala, and accumbens. Volume comparison between the groups was performed with a multivariable regression analysis adjusting for age, sex, pain intensity, log-transformed pain duration, history of back surgery, and intracranial volume.
Volumetric Analysis of Hippocampal Subfields
Volumetric segmentation of hippocampal subfields was performed with FreeSurfer 6.0 , which is documented and freely available for download online (http://surfer.nmr.mgh.harvard.edu/), using T1-weighted images. Based on a statistical atlas built primarily upon ultra-high resolution (ca. 0.1 mm isotropic) ex vivo MRI data , this tool generates an automated segmentation of the 24 hippocampal subfields, including left/right parasubiculum, presubiculum, subiculum, cornu ammonis 1 (CA1), CA3, CA4, granule cell and molecular layer of the dentate gyrus (GC-ML-DG), molecular layer, hippocampal amygdala transition area (HATA), fimbria, hippocampal tail, and hippocampal fissure. Group differences of these volumes were examined using a multivariable regression analysis adjusting for age, sex, pain intensity, log-transformed pain duration, history of back surgery, and intracranial volume. Significant difference was identified by FEW-p (0.05/24 ≈ 0.002).
Statistical analyses were performed using MATLAB 2016a (The MathWorks, Natick) and JMP Pro version 13.2 (SAS Institute, Cary, NC). Brain schema was visualized on a surface rendering of a human brain atlas with the BrainNet Viewer (http://www.nitrc.org/projects/bnv/) . The term visualization was performed using a word cloud (Text Analytical Toolbox in MATLAB).