FormalPara Key Points

The site of measurement (in person versus remote) may impact how the patient scores the assessment.

When comparing different age groups across the seven subdomains, we found that there were significant differences in the remote versus in-person assessment.

For those patients less than 40 years of age, depression and sleep disturbance were rated statistically worse in the remote setting, whereas in the other five domains of the PROMIS-29 v2.1 there were no statistical differences.

For those aged 40–59, the only statistical difference was the reported worsening sleep disturbance in the remote setting, while the other domains remained statistically nonsignificant. For those aged 60–79, reporting of worsening function was statistically different for the in-person assessment for fatigue, physical function, and pain interference.

For those aged 80 or above, statistically greater dysfunction was reported with remote assessment for social participation (ability), depression, and physical function, while anxiety was statistically worse for the in-person assessment.

Introduction

Clinical data represent the resource most central to advancing healthcare and have the potential to generate quality information that aid the acquisition of new knowledge and guide the development of best practices [1, 2]. Pain is among the most common reasons for accessing the healthcare system in the United States [3]. Back pain alone is estimated to be the eighth most costly chronic condition in individuals aged 18–64 years and the third most prevalent disease group [4,5,6]. Current data on the incidence, prevalence, and consequences of pain are neither comprehensive nor completely accurate, in part because the data that are collected around pain are not uniform and are related to underlying conditions or events [3, 7,8,9]. Creating a unified measurement language is essential to the effective communication of patients’ needs and outcomes, appropriate triage, and improved care delivery. The National Institutes of Health (NIH) and Institute of Medicine (IoM) have promoted the need for better validation tools to measure the incidence, prevalence, and outcomes in the field of pain management [7]. One key reason for the lack of data on pain is that there are currently no standardized methods, definitions, or survey questions regarding pain used in population-based studies across and within agencies [7]. Multidimensional patient-reported outcome (PRO) [7] assessments are now recommended by the IoM [10], the United States Department of Health and Human Services (HHS) [11], the NIH Research Task Force (RTF) for Chronic Low Back Pain [10], and the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) [12], amongst other recognized consensus panels in the field. Multidimensional PROs have changed the measurement of pain reporting from the once linear numerical rating scales (NRS) and visual analog scales (VAS) to a comprehensive biopsychosocial evaluation of patients’ well-being.

In 2004, the NIH created the Patient-Reported Outcomes Measurement Information System (PROMIS) to validate outcome measures with greater accuracy [13]. The PROMIS-29 set assesses mental health, physical health, and social health through seven four-question instruments, which include fatigue, pain intensity, pain interference, physical function, sleep disturbance, anxiety, depression, and ability to participate in social roles and responsibilities [14]. When evaluating PROMIS instruments, a T-score of 50 is the average for the US general population, with a standard deviation of 10. The T-score is reported with a standard error. Larger T-scores represent a larger degree of the domain being measured, meaning that negatively worded concepts represent worse than average scores, whereas larger T-scores of positively worded attributes represent a better than average score (see Table 1).

Table 1 Baseline demographic and PROMIS-29 v2.1 data stratified by impact score (N = 25,187)

The PROMIS-29 has been used in thousands of clinical studies, published in over 150 journals, and is validated with cross-talk to other instruments. However, it is essential that the data input is of sufficient quality to be used in clinical decision-making. To our knowledge, there has been no inquiry into whether the clinical setting influences PROMIS-29 baseline data. It is known that several external factors, including environment and psychology, affect how individuals conduct self-reports [15]. The quality of the input data determines not only the reliability of validation tools, accuracy of research outcomes, and development of best practices, but also the performance of clinical decision support systems (CDSS) and reimbursement rates. Understanding the influence that a patient’s immediate environment may have on data input could be an important component for data collection tools. Nomogram data was recently reported on patents presenting to spine and pain practices, with surprising homogeneity, exclusive of significant differences between the < 40 and 80+ age groups in both sleep disturbance and physical function [16]. The objective of this study is to determine whether patients’ baseline self-assessment, as represented by the PROMIS-29, is different if the assessment is performed in person at the clinic versus completed remotely (e.g. home). Clearly, this validation is not only necessary for ongoing clinical research investigations, but is also clinically relevant during the COVID pandemic transition to telehealth, with the use of the emergency 1135 waiver [17]. To date, there are no studies or data published on whether the initial intake data among the chronic pain population are influenced by the clinical environment, and if so, what contextual information plays a key role in data integrity.

Methods

This study is a retrospective, multicenter quantitative analysis of new patient baseline pain data quality. A waiver of consent and a full waiver of HIPAA (Health Insurance Portability and Accountability Act of 1996) authorization was obtained through the Western Institutional Review Board (WIRB) under Common Rule 45 CFR 46.116. In 12 participating sites in the USA, subjects were consecutively enrolled if they were a new patient presenting to spine and pain practices from August 2018 to December 2020. The inclusion criteria were as follows: new patients entering a pain or spine participating practice, > 18 years of age, English-speaking, and completion of the PROMIS-29 within 7 days of the initial appointment. Exclusion criteria comprised existing patients at the practice and individuals who did not complete the PROMIS-29 within 7 days of their initial intake appointment. Patients were not excluded for comorbidities such as psychiatric disease, neurological disease, or other active disease issues.

The PROMIS-29 (v2.1) consists of the following assessments: PROMIS Short Form (SF) v2.0–Physical Function 4a, PROMIS SF v1.0–Anxiety 4a, PROMIS SF v1.0–Depression 4a, PROMIS SF v1.0–Sleep Disturbance 4a, PROMIS SF v1.0–Ability to Participate in Social Roles and Activities 4a, PROMIS SF v1.0–Pain Interference 4a, PROMIS SF v1.0–Fatigue 4a, and PROMIS Pain Intensity item–Numerical Rating Scale (Global07).

The complete PROMIS-29 assessment, as well as patient demographic data including age and gender, was captured using a digital outcome capture system (Real World Outcomes™, Celéri Health, Wilmington, DE, USA) within 7 days of the initial consultation for pain care. Patient location at the time of data input was captured as either at the clinical visit or in a remote setting outside the clinic. Remote setting data collection did not include visual (video) technology. Statistical analysis was performed, calculating descriptive statistics of the median, mean, and mode of the reported T-scores of the PROMIS-29 for its seven independent domains. No deviation from protocol occurred.

Statistics

SPSS version 22 software was used to perform the analysis. When comparing different age groups across the seven subdomains, along with the gender assessment and the influence of remote versus non-remote site of service, Student’s t-test was employed to test the differences between groups. Data are presented as mean ± standard deviation. Statistical difference was defined as a p value less than or equal to 0.05.

Results

A total of 25,187 unique patients assessed between August 2018 and December 2020 were enrolled in the study. Patients’ baseline PROMIS-29 v2.1 data were obtained upon entry into spine or pain practices across the United States. Patients were categorized by age group (less than 40 years, 41–60 years, 61–80 years, and greater than 81 years), along with an assessment of the effects of gender on the measured sample and site assessment completion (remote versus in person). Gender did not have a significant effect on the sample. Of the 25,187 patients assessed, 4806 (19.1%) completed the survey in the remote setting, whereas 20,381 (80.9%) completed the survey in the clinic. With regard to age, 2350 patients were less than 40 years of age, 8204 were aged 40–59 years, 12,111 were aged 60–79 years, and 2516 were over the age of 80.

The PROMIS-29 v2.1 battery consists of seven instruments. For remote survey responders, covering all age groups, the average PROMIS-29 impact scores reported for ability, anxiety, depression, fatigue, sleep disturbance, physical function, and pain interference were 40.0, 54.3, 54.5, 58.4, 58.1, 36.5, and 64.4, respectively. In comparison, the non-remote survey responders reported averages of 40.1, 53.7, 54.2, 58.3, 60.0, 36.0, and 64.6, respectively. These groups were analyzed to detect differences amongst these cohorts in terms of their population profile related to mental, physical, and social health, sorted by subject age, assessing the mean, median, and mode for each age group.

Mental Health

The mean depression assessment scores from the PROMIS SF v1.0 Depression scale demonstrate averages of 55.5, 54.8, 53.8, and 55.5 for age categories of less than 40, 40–59, 60–79, and above 80 years, respectively, in the remote setting. The mean anxiety assessment scores from the PROMIS SF v1.0 Anxiety 4a were 54.3 for the remote and 53.7 for the non-remote setting. The anxiety scale averages for the remote setting by age were 40.8, 39.7, 40.3, and 39.6 for age categories of less than 40, 40–59, 60–79, and above 80 years, respectively. In the age category of less than 40 years, the remote versus non-remote rating of depression was significantly different (p = 0.003), with patients reporting a higher degree of depression in the remote assessment. Interestingly, patients above 80 years of age demonstrated statistically significant changes in rating depression (p = 0.025) in the remote setting and anxiety (p = 0.031) in the non-remote setting.

Physical Health

The pain interference assessment from the PROMIS SF v1.0 Pain Interference 4a scale demonstrates averages of 64.4 in the remote and 64.6 in the non-remote setting. The remote averages for age categories less than 40, 40–59, 60–79, and above 80 years were 64.7, 65.0, 63.9, and 64.1, respectively. There was statically significant worsening dysfunction for the 60–79 age group for in-person, non-remote assessments (p = 0.005; 63.9 vs. 64.4). Fatigue scores assessed via the PROMIS SF v1.0 Fatigue 4a in the chronic pain population averaged 58.4 for the remote setting versus 58.3 in the non-remote setting, with average remote scores of 59.2, 59.3, 57.2, and 58.2, respectively, for age categories of less than 40, 40–59, 60–79, and above 80 years. There was a significant difference in the 60–79 age group (p = 0.020; 57.2 vs. 57.7) for fatigue, with greater dysfunction in the non-remote, in-person assessment. Physical function, measured from the PROMIS SF v1.0 Physical Function 4a, demonstrated high scores as supportive of care and low scores representing greater dysfunction in the remote versus non-remote setting, with an average of 36.5 versus 36.0, respectively. The average remote scores were 37.9, 36.7, 36.4, and 33.0 for age groups of less than 40, 41–60, 61–80, and above 80 years, respectively. There were significant differences in the 60–79 age group favoring greater dysfunction for the in-person assessment (p < 0.0001; 36.4 vs.. 35.7), while less dysfunction was reported for the in-person assessment in age group > 80 (p = 0.25; 33.0 vs. 34.0). Mean sleep disturbance scores as measured by the PROMIS-29 SF v1.0 Sleep Disturbance 4a were 58.1 and 58.3 in the remote and non-remote settings, with age categories of less than 40, 40–59, 60–79, and above 80 years in the remote setting reporting values of 60.5, 60.0, 56.1, and 53.8, respectively. There were significant differences for worsening sleep disturbance for remote assessments for age groups < 40 (p < 0.0001; 60.5 vs. 58.9) and 40–59 (p < 0.0001; 60.0 vs. 59.1).

Those aged 60–79 showed a significant difference in the remote versus non-remote ratings for pain interference (p = 0.005; 63.9 vs. 64.4), physical function (p = 0.000; 36.4 vs. 35.7), and fatigue (p = 0.020; 57.2 vs. 57.7), demonstrating worsening dysfunction with in-person assessments.

Social Function

Mean scores on the PROMIS-29 SF v2.0 4a Ability to Participate in Social Roles and Activities scale were 40.0 for remote and 40.1 for non-remote assessments. In the remote setting the mean scores were 40.3, 39.7, 40.3, and 39.6 for patients less than 40, 41–60, 61–80, and above 80 years of age, respectively. Only those above 80 years of age showed a significant difference in rating of ability in the remote compared to the non-remote setting (p = 0.031; 39.6 vs. 40.7), with worsening dysfunction in the remote assessment.

Gender Difference Analysis

The influence of gender on PROMIS-29 assessment was evaluated across the seven domains, including remote versus non-remote setting, for patients entering a chronic pain or spine practice in the United States, and no significant difference was determined.

Discussion

This study is the first analysis of the influence of patient location on the PROMISE-29 v2.1 scoring across the seven domains within the instrument, as determined by age and gender. The results have significant implications for baseline PRO assessments across different patient settings, as influenced by age. The PROMIS-29 is generalized to the entire US population, and as such, can serve as a common measurement language that is universally relevant across disease states. The creation of a normative data set for the chronic pain population at multiple sites was performed in a previous work [17]; this study serves to evaluate the influence of the site of patient instrument completion (remote versus in person) on PROMIS-29 baseline presentation in the chronic pain and spine population presenting for care.

For most PROMIS-29 instruments, the average T-score for the US general population is 50, with a standard deviation of 10 [16, 17]. A greater T-score signifies a larger measurement on the domain. In other words, a greater T-score for negatively worded concepts represents a worse than average score, whereas higher T-scores of positively worded attributes represent a better than average score. Chronic pain patients demonstrate statistically higher reported dysfunction than the general population across the multiple domains assessed by the PROMIS-29.

When comparing different age groups across the seven subdomains, we found that there were significant differences in the remote versus in-person assessment. For those patients less than 40 years of age, depression and sleep disturbance were rated statistically worse in the remote setting, while in the other five domains of the PROMIS-29 v2.1 there were no statistical differences. The statistically worse depression and sleep disturbance ratings in the remote setting presented an interesting finding. However, we would naturally expect patients in isolation to rate these variables as worse. For those aged 40–59, the only statistical difference was worsening sleep disturbance reported in the remote setting, while the other domains remained statistically nonsignificant. For those aged 60–79, statistically different worsening function was reported in the in-person assessment for fatigue, physical function, and pain interference. For those aged 80 or above, statistically greater dysfunction was reported in remote assessment for social participation (ability), depression, and physical function, while anxiety was statistically worse for the in-person assessment. One possible reason for the differences in remote versus in-person evaluation may stem from patients believing that providers are more likely to review their responses when they complete the surveys in person.

Overall, this study reports on a diverse cohort with fairly generalizable results. Patients were examined in different pain and spine clinics across the United States, presenting with various diagnoses, past medical history, and biological and environmental influences, including medications, surgeries, and social factors. The study had the advantage of being able to administer the survey remotely in a timely fashion on an easy user interface. Furthermore, the study was cost-effective and could be administered through multiple platforms.

There are several limitations to this study. Although the data were collected prospectively, our analysis is retrospective. This study presents the effect of the site of the assessment of the PROMIS-29 v2.1 on the scoring, and did not discern the reason the patient elected to be assessed remotely or in person. Clearly, during this time, in response to the COVID 19 pandemic [17,18,19,20], the Centers for Medicare and Medicaid Services (CMS) relaxed regulations to expand telehealth to allow for telehealth strategies to be administered directly in the patient’s home. On March 18, 2020, CMS recommended limiting nonessential care [18]. An interesting dilemma emerged that emphasized limitations regarding how diversely medical practice is governed in the United States between states—the response to COVID-19 has been highly variable based upon the state agency [19, 20]. This study is a multicenter study across the United States that took place during a time of dissimilar state and local regulations. Therefore, these factors may have had an impact on the reported data given the context in which some surveys were collected. Specifically, the COVID-19 pandemic has isolated more patients and directly affected their responses regarding anxiety, depression, reported sleep disturbance, physical health, fatigue, and pain interference. To elucidate this discrepancy, further analysis of multiple time points of survey data before-and-after outcomes in the remote and non-remote setting would be beneficial. In such an analysis, more information will help determine how to best deliver measurement instruments for the most accurate data collection. Further limiting the study is the potential presence of confounding factors, such as current use of psychiatric medications, patient education level, comorbidities, and duration of chronic pain. Additional analyses are needed to evaluate potential confounders.

Conclusion

Patient outcome assessment is changing, focused on the complete survey of patient well-being using universally relevant measures rather than linear scores. This data set is the first published normative data set demonstrating the use of the PROMIS-29 measurement in the remote and non-remote setting before and during a pandemic in the chronic pain population, with a clear impact on the baseline scoring, based on age. Despite stratified analysis of gender by each category, no significant difference appeared in remote versus in-person assessment. This insight has implications for measurement from baseline scores, based on site of service, and has future implications for longitudinal data sets. Further study is needed to fully appreciate the implications of the norms of this population and the influence of environment on data collection.