FormalPara Key Summary Points

Why carry out this study?

Chronic pain (CP) is a complex multidimensional experience severely affecting the quality of life of individuals. Multiple cognitive, affective, emotional, and interpersonal factors play a major role in CP.

The psychological, social, and physical circumstances leading to CP show high inter-individual variability, thus making it difficult to identify core syndrome characteristics.

In a biopsychosocial perspective, we aim at identifying a pattern of psycho-physical impairments that can reliably discriminate between CP individuals and healthy controls (HC) with high accuracy and estimated generalizability using machine learning.

What was learned from the study?

Our psycho-physical classifier could discriminate CP from HC with 86.5% balanced accuracy and significance (p = 0.0001). The most reliable features characterizing CP were anxiety and depression scores, and belief of harm consequent to prolonged pharmacological treatments; for HP, the most reliable features were physical and occupational functioning, and vitality levels.

We think that our algorithm provides novel insights about potential individualized targets for CP-related early intervention programs. We think that our algorithm provides novel insights about potential individualized targets for CP-related early intervention programs.

Introduction

Pain is defined as an unpleased sensory or emotional experience associated with actual or potential tissue damage, or described in terms of such damage [1, 2]. While acute pain is an early warning sign that usually elicits reflex withdrawal and thereby promotes survival [3], chronic pain (CP) lasts longer than the injury itself, usually more than 3 months, and is an expression of nervous system maladaptation. CP affects 10–30% of the adult population in Europe [4, 5]. It is nowadays a major issue for public health and is associated with high economic and social burden [6], not only for individuals experiencing pain but also for their family, caregivers, and friends [7].

CP was recently defined as “a complex multidimensional experience” [4], to the extent that the biopsychosocial model is now considered the most valid paradigm for understanding the complexity of this syndrome. Indeed, in addition to physical functioning impairments, CP individuals show decreased productivity, increased mood swings or more severe depressive mood, impaired social functioning, increased sleep disturbances, decreased cognitive ability [8], and decreased participation in leisure activities [9] compared with healthy controls (HC). Overall, these impairments lead to a marked decrease in the overall perceived and objective quality of life in people experiencing CP, compared with HC. Notably, not only is CP architecture complex and multifactorial, but CP individuals are also very heterogeneous. Furthermore, the psychological, social, and physical circumstances leading to CP show a high degree of inter-individual variability [10]. Therefore, the identification of a psycho-physical signature able to reliably discriminate CP from HC at the single-subject level [11] may aid in diagnostic categorization. Such a signature could potentially provide novel insights about specific targets that might be central in individualized early identification and intervention programs.

So far, studies have characterized CP and HC differences in terms of group differences through univariate statistics, which are limited in terms of generalizability assessments [12, 13]. Moreover, to understand whether a syndrome-associated characteristic could also be qualified as a marker (i.e., as a measurable feature associated with a certain condition or process [14, 15]), one should investigate its sensitivity and specificity in identifying the respective patient population [12]. A promising way of addressing this question is by employing machine learning, which allows one to quantify the sensitivity, specificity, and generalizability of a disease signature at the single-subject level [16, 17], rather than just describing it at the group level. Therefore, machine learning allows one to (i) grasp the complex architecture of CP given by the multiple variables associated with the condition, and (ii) potentially clarify the factors underlying the high inter-individual heterogeneity between CP individuals.

While machine learning techniques have been recently applied to CP classification using neuroimaging data [18,19,20,21] to the best of our knowledge no study has employed self-administered or observer-rated clinical assessments within a machine learning environment with the same aim. Therefore, following a biopsychosocial perspective, the aim of this study is to identify a pattern of psychological and physical impairments that can reliably discriminate between CP and HC with high accuracy and estimated generalizability using machine learning.

Methods

Sample Determination

A total of 118 consecutive outpatients (63.6% female; mean age 57.1 years) with chronic pain (CP), treated with analgesic medication, were recruited from the Pain Therapy Clinic, University of Bari General Hospital. Individuals diagnosed with CP had a subjective experience of physical pain, localized anywhere in the body, for at least 3 months, as defined by the European Federation of International Association for the Study of Pain (IASP). In addition, 86 healthy controls (HC) were recruited either through flyers or among the CP individuals’ and caregivers’ (i.e., those who accompanied CP individuals to the visit) social networks. Therefore, our HC group was mainly composed of individuals who (i) were in contact with either CP individuals or, if present and available, their caregivers, (ii) learned of the research study via either CP individuals or, if present and available, their caregivers, and (iii) were willing to take part in the research. No CP caregiver entered the study as a HC. Common exclusion criteria for HC and CP were the presence of any other significant clinical condition (besides the CP diagnosis) or the presence of any Axis I psychiatric disorder, according to the Structured Clinical Interview for DSM-IV [22]. The demographic and clinical characteristics of CP and HC individuals are reported in Table 1. Independent-samples t tests, chi-square tests, and Fisher’s exact tests were employed to investigate the presence of any demographic differences between the two groups. All p values were multiple-comparison-corrected through the false discovery rate (FDR) method, following published procedures [23]. The significance was determined at α = 0.05.

Table 1 Demographic characteristics of chronic pain (CP) and healthy control (HC) individuals

This research was conducted ethically in accordance with the World Medical Association (www.wma.net) Declaration of Helsinki of 1964 and its later amendments. All individuals have given their written informed consent before entering the study. The research protocol has been approved by Bari University Hospital local ethical committee.

Psychological Assessment

All individuals were administered the following questionnaires:

  1. 1.

    Beliefs about Medicines Questionnaire—General section (BMQ-General), to broadly investigate cognitive beliefs about the utility and the harm of pharmacological treatments; the BMQ-General section includes two factors: the evaluation of to what degree medicines are harmful and agreement with the fact that they should not be taken continuously (General-Harm), and agreement with the fact that doctors tend to prescribe too many drugs (General-Overuse). The BMQ-Specific section was administered to individuals but was purposely excluded from further analyses in order to avoid introducing in the experiment a bias related to the fact that CP individuals were under analgesic treatment, while HC were not. The Italian version of the BMQ shows high validity and reliability properties [24].

  2. 2.

    The Hospital Anxiety and Depression Scale (HADS), to determine individual levels of anxiety and depression; HADS Italian version shows good validity and reliability [25].

  3. 3.

    36-Item Short Form Health Survey (SF-36), to evaluate the individual levels of quality of life across seven domains: vitality, physical functioning, bodily pain, general health perceptions, physical role functioning, emotional role functioning, social role functioning, mental health. For analysis purposes, the “bodily pain” domain was excluded from further analyses given that this factor could be highly related to the label (i.e., CP vs. HC). The Italian version of the SF-36 [26] has high reliability and good validity.

  4. 4.

    Coloured Progressive Matrices (CPM), to evaluate non-verbal intelligence. Both raw and age- and education-corrected scores were entered in the analyses. CPM have very good psychometric properties, especially in terms of validity and reliability [27].

Machine Learning Analysis

The overall analytical strategy was to use all the subscales and scores acquired through the BMQ-General, HADS, SF-36 (bodily pain subscale excluded, see Sect. 2.2), and the CPM to build a psycho-physical multivariate model able to accurately discriminate between HC and CP. In total, the model was built based on 13 features, which are all reported in Table 2. Two-sample t tests were employed to assess differences between HC and CP for each of the features entering the machine learning algorithm. All p values were < 0.05, FDR corrected for all the subscales used [23]. Machine learning analyses were performed using NeuroMiner version 1.0 software (https://github.com/neurominer-git?tab=repositories). All analysis steps are described in the following sections.

Table 2 Chronic pain (CP) and healthy controls (HC) mean and standard deviation values for each of the features entered in the machine learning algorithm (assessments are fully described in Sect. 2.2)

Cross-Validation Framework

To prevent information leaking between individuals used for training and testing the models [28], we built a double cycle, nested cross-validation (CV) framework [16]. Indeed, we split the data first into training and test sets on an outer (CV2) cycle, and then we split the resulting training folds again into an inner (CV1) training and test data cycle [29]. Therefore, nested CV induces a strict separation between training and test data. This way, in a machine learning framework, parameter optimization is performed within the inner (CV1) cycle, and generalization error estimation is performed only from the outer (CV2) cycle. CV2 samples never visited the classification algorithms during the entire training process [28]. In both inner (CV1) and outer (CV2) CV levels, we employed a tenfold CV cycle. We extended nested CV to repeated nested CV [12] at both the inner and outer cross-validation cycles by randomly permuting the participants within their groups (number of permutations: in CV1 = 5, in CV2 = 10) and repeating the CV cycle for each of these permutations.

Data Preprocessing

Our NeuroMiner machine learning preprocessing pipeline consisted of the following steps:

  1. 1.

    As many machine learning algorithms are sensitive to scale differences between features, we scaled each variable to a 0–1 range to remove these effects from each training sample matrix. The scaling parameters were then applied to the inner and outer CV.

  2. 2.

    To avoid the effect of any demographic confounds on the algorithm performance, features’ scaled scores were further preprocessed through correction for age, gender, education, work, and marital condition. Specifically, we removed the variance associated with these demographic variables within each inner and outer CV fold through partial correlations.

Feature Selection and Machine Learning Algorithm Implementation

Features included in the algorithm underwent a stepwise forward variable selection process [30] using a linear support vector machine (SVM) [31]. Specifically, data entered a greedy forward search wrapper [30] which allows for the identification of the most parsimonious subset of variables within the given variable pool, thus providing maximum prognostic performance with the smallest amount of predictive features. The wrapper algorithm used an SVM to evaluate the predictive value of each feature, then extracted the most predictive variable and reiterated over the remaining variable pool to select the second best performing variable, which was added to the first one. This process was reiterated until the optimal variable subspace had been identified. We stopped the variable search when the top 20% of the variables had been extracted by the wrapper, thus allowing us to identify a clinically applicable set of top-performing variables for classification purposes.

The wrapper-based feature selection was carried out for each CV1 training and test sample and then repeated for every combination of the SVM parameters C (misclassification cost) and γ (kernel width) within a grid defined by the ranges C = [0.0156—16] and γ = [3.0518–5—8]. In each variable evaluation step in the CV1, the SVM algorithm modeled linear relationships between features and classification labels (HC vs. CP). In the linear kernel space, the SVM optimized a hyperplane that maximized separability between most HC-like and most CP-like subjects (i.e., the support vectors). Based on the trained hyperplane, the algorithm then predicted subjects’ classification (HC vs. CP) of the inner CV1 cycle by projecting its data into the CV2 learned kernel space and measuring their geometric distance to the decision boundary. This resulted in a decision value and a predicted classification label per participant.

Investigation of Individual Features’ Relevance Within the Machine Learning Algorithm

To better understand which variables might inform CP and HC classes at the single-subject level, we checked which features were the most reliable. Reliability for each feature is defined in terms of a cross-validation ratio (CVR = mean(w)/standard error(w)) [32]. In this formula, w represents the normalized individual weights from SVM models generated in the repeated nested CV scheme. Normalization is performed using the Euclidean norm of w, defined as s = w/||w||2 [32]. A positive CVR for each feature indicates higher CVR scores in CP compared to HC, while a negative CVR for each feature indicates higher CVR scores for HC compared to CP.

Permutation Testing

To assign statistical significance to the observed classification performance, we employed permutation [31]. We performed 1000 random permutations of the outcome labels (ie, HC vs. CP). For each permutation, we retrained all linear SVM models in the repeated nested CV experiment using the respective feature subsets obtained from the observed-label analyses. For each permutation, we accumulated the predictions of the random models into a permuted ensemble prediction for each outer cycle subject. Thus, we built a null distribution of out-of-training classification performance (BAC) for every unimodal classifier. Finally, we calculated the significance of the observed out-of-training BAC as the number of events where the permuted out-of-training BAC was higher than or equal to the observed BAC divided by the number of permutations performed. The significance of the model was determined at α = 0.05.

Results

Demographic Differences Between Samples

HC and CP individuals did not differ by age, gender, or education level (all p > 0.05, Table 1). However, marital status and occupational status were differentially distributed across HC and CP individuals, respectively (both p = 0.001, Table 1). With regard to the features that entered the machine learning algorithm, CP had higher HADS scores on both anxiety and depression subscales, in BMQ harm (all p = 0.001, Table 2), and CPM raw and corrected scores (respectively, p = 0.007 and p = 0.004, Table 2); on the other hand, HC had higher scores than CP in all the SF-36 subscales entering the algorithm (all p = 0.001, Table 2).

Machine Learning Results

The cognitive classifier correctly discriminated CP from HC with a cross-validated balanced accuracy (BAC) of 86.5% and was significant at p < 0.001, with an area under the curve (AUC) of 0.92. Detailed classification metrics are reported in Table 3. Out of all 13 features originally included in the model, those with the highest positive CVR were HADS-depression, HADS-anxiety, and BMQ-overuse, while those with the highest negative CVR were SF-36 vitality, physical functioning, and physical role functioning (Table 4, Fig. 1).

Table 3 Validated classification performance of the classifier trained based on psycho-physical assessments within a repeated nested cross-validation framework
Table 4 Cross-validation ratio (CVR) score of each feature within the machine learning algorithm, representing its reliability
Fig. 1
figure 1

Depiction of the cross-validation ratio scores, representing the reliability of each feature included the algorithm. A positive CVR for each feature indicates higher CVR scores in chronic pain (CP) individuals compared to healthy controls (HC), while a negative CVR for each feature indicates higher CVR scores for HC compared to CP

Discussion

In the current study, we aimed at building a multivariate classification model through machine learning techniques that was able to discriminate between HC and CP individuals. In a biopsychosocial perspective, we built this classification model based on a wide variety of psychological, physical, and mental health-related features, as well as cognitive features, while strictly controlling for any potential demographic confounds (age, gender, education, marital status, and occupational status).

Before entering the machine learning framework, we observed that CP had higher HADS and BMQ harm subscales scores compared with HC. Concerning HADS, the higher depression scores we found in CP individuals compared with HC are consistent with a large body of literature demonstrating that CP individuals very often develop depression after the CP diagnosis [33], to the extent that approximately 85% of patients with CP suffer from severe depression [34]. On the other hand, the higher anxiety scores in CP compared with HC match previous findings elucidating that fear and anxiety levels in CP individuals are associated with greater pain-related perceived disability via avoidance, cognitive preoccupation, and stress-related muscle activity [10]. Higher BMQ scores in CP compared with HC may reflect the fact that CP individuals more strongly believe that medicines may be harmful [34]. Although CP individuals were under analgesic medication, this finding should not be affected by ongoing treatment, given that the BMQ-General section is designed to capture overall perceptions of medication in general and despite any potential ongoing chronic illness (in contrast to BMQ-Specific, which indeed was not included in the machine learning algorithm).

Our results also reveal that HC have higher CPM scores than CP, thus showing higher overall cognitive ability. This finding is consistent with several previous studies (for a review, see [2]). However, recent views have proposed that CP may be not directly associated with cognition itself but may potentially influence it via the other comorbidities that are frequently associated with CP, such as anxiety, depression, or emotional dysregulation. If validated, this hypothesis would imply future challenges for new targets within CP early intervention programs [8].

Furthermore, in all the SF-36 sub-dimensions, we found that all HC had higher scores than CP. Indeed, the CP state is very often associated with reduced physical activity [4], and this reduction is associated with the intensity, duration, and location of pain. This, in turn, seems to affect patients’ overall quality of life [35], such as emotional, psychological, and social functioning, which indeed were found to be lower in our CP individuals compared with HC. Of note, the smaller CP-HC difference across the SF-36 sub-dimensions was found for the “general health perceptions” subscale. Indeed, previous studies found that CP individuals are often not -conscious of the degree of physical impairment experienced and tend to overestimate their ability and do not feel impaired [4, 36]. This may have potential implications for CP early intervention strategies, since making patients more conscious of their impairments and behaviors can potentially promote a healthier and more active lifestyle.

Our machine learning results show for the first time that a pattern of psychological and physical health indices can discriminate between CP and HC with high balanced accuracy and significance. Indeed, when pooling a set of variables clearly associated with the CP experience based on univariate studies [4], and using them to discriminate CP from HC, we are able to correctly discriminate a CP individual from a HC at the single-subject level in 86.5% of cases within our repeated-nested cross-validation framework. Notably, as in previous studies using the same SVM technique [11, 15], we employed a stringent separation of training and test sets, and a robust, repeated-nested CV scheme. These methodological choices are in line with recent recommendations (for a review, see [12]), which noted that the gold-standard CV scheme ensuring the highest degree of reliability and generalizability of machine learning findings, in the absence of external replication samples, is nested CV. Thus, the psycho-physical HC versus CP classification model we have developed not only is highly accurate but also shows a very good extent of estimated generalizability.

The high accuracy and estimated generalizability our SVM algorithm further validate our biopsychosocial model’s application in clinical practice, as it demonstrates (1) how several sensory, cognitive/affective, and interpersonal factors together contribute to CP syndrome, and (2) that CP is associated with several psychological and physical processes that, in turn, affect the pain experience [36]). Specifically, we observed that, within this pool of variables clearly associated with the CP experience based on univariate studies [4], not all the features have the same discriminatory power; some are highly characteristic of only the CP status, and others of only the HC status. Indeed, within our algorithm, higher HADS-depression, HADS-anxiety, and BMQ-overuse scores are the most relevant features in discriminating a CP from a HC, thus showing higher sensitivity potential, while higher SF-36 vitality, physical functioning, and physical role functioning scores are the most reliable features for discriminating a HC from a CP, thus showing higher specificity potential. HADS findings further confirm the tight link between CP and depression. Indeed, depression and CP seem to influence each other, their respective development, and their respective severity [33]. The prognostic relevance of depression for CP is further supported by recent views highlighting that depressed CP individuals have a poorer prognosis than non-depressed CP [36]. On the other hand, CP-related anxiety has proven to influence the pain experience in multiple ways. Indeed, pain may cause feelings of anxiety, which in turn may increase individual pain sensitivity and make the experience of pain more persistent [37]. Furthermore, anxiety and CP share common cognitive and behavioral processes, such as increased attention towards threatening stimuli and avoidance of physical exertion [38]. On the prognostic level, pain-related fear and anxiety have been previously associated with greater disability and persistent pain experience [39]. The relevant role of the BMQ-overuse subscale in our HC-CP psycho-physical algorithm as a psychological feature which is more “CP-like” than “HC-like” is also not surprising. Indeed, as recently proposed [34], CP individuals may consider that their condition is irreversible and that, regardless of whether any therapy could be helpful, a relapse might always occur. This view is also coherent with the increased attention towards threats characteristic of CP individuals [38].

On the other hand, findings revealed that the features contributing the most to the model’s accuracy being more prototypical of HC compared with CP were those related to physical functioning, occupational functioning, and vitality (i.e., perceived energy and fatigue). These findings further confirm that, compared with HC, who don’t show any significant limitations in these areas, the physical and occupational functioning of CP and their perceived energy are significantly affected by the pain experience itself. Previous studies have shown that pain is significantly associated with both psychological and physical dimensions of health-related quality of life [4], and that in each health-related quality of life dimension, CP score significantly worse than HC [40]. However, consistent with our findings, the greater impact of pain is on physical, rather than on mental, quality of life indices [41]. As for occupational functioning, their relevance in our SVM algorithm is coherent with the fact that, differently from HC, CP individuals often experience difficulties in their workplaces. Indeed, CP is often associated with higher absenteeism, early retirement, and more days of sick leave [42], especially for back pain and rheumatic diseases [4]. Furthermore, CP individuals are often forced to change their duties at their workplace due to their physical and psychological symptoms, and this may result in the loss of their jobs [41].

Taking these findings together, it seems that our algorithm identified a pattern of psychological feelings, disbeliefs, and cognitive distortions that are highly characteristic of CP individuals and reliably discriminate them from HC. On the other hand, it identifies a pattern of physical, occupational, and energy-related features highly characteristic of HC, and significantly less present in CP individuals.

Limitations

This study has certain limitations. Despite the stringent training and test data separation in our SVM algorithm and the nested CV employed, replication of our findings in independent and geographically different samples is needed to ensure that our findings are replicable, as external validation is considered the gold standard in the field for assessing a model’s effective (and not just estimated) generalizability. Moreover, it should be noted that our HC sample was composed mainly of individuals linked to the social circle of CP individuals’ caregivers. Although this allowed us to build a HC sample with similar age and education level relative to those of our CP sample, it does not guarantee a purely random sampling of the HC group. Larger and randomly sampled external validation groups are needed to further test the generalizability of our psycho-physical machine learning algorithm. Another limitation that should be taken into account is that further clinically relevant aspects of the pain experience such as pain severity, duration of drug treatment, or history of treatments tried were not considered in this study. If on the one hand this is probably a consequence of the employment of broad inclusion/exclusion criteria, on the other hand, the general aim of the research project was to include in the study CP individuals based solely on diagnosis, and irrespective of their past pain experiences. Nevertheless, we think that future studies investigating the potential association between the single-subject-level decision scores generated through our machine learning algorithm and these highly relevant clinical aspects of the pain experience are warranted to provide deeper insights into the potential translation into clinical practice of our psycho-physical classification model. Consistently, our neuropsychological findings are limited by the fact that in our study, the cognitive area is represented solely by the CPM test. Although beyond the scope of this study, a better characterization of individual neurocognitive functions through a broader battery assessing specific (rather than general) cognitive functioning and sub-domains would have been more informative about the potential existence of specific neuropsychological assessments that could capture core pain-related cognitive deficits. Future studies in this direction are definitely needed.

More importantly, the cross-sectional nature of this study does not allow us to give any prognostic insight into CP based on these findings, or to fully understand its translation into clinical practice potential. Longitudinal studies are warranted to provide machine learning-based prognostic information.

Conclusions

Our findings suggest that, using psychological and physical assessments, it is possible to classify CP from HC with high reliability via (1) a pattern of psychological symptoms (identified through HADS subscales) and cognitive beliefs (identified through BMQ subscales) characterizing CP, and (2) a pattern of intact physical functions (identified through SF-36 subscales) characterizing HC. We think that our algorithm provides important and novel insights for clinical practice. Indeed, if externally validated in geographically diverse cohorts and with longitudinal information, the investigation of such psycho-physical impairments through these subscales could be prioritized, helping to better tailor early identification and intervention strategies in CP through:

  • constant monitoring of the onset and the evolution of symptoms of depression and anxiety and of cognitive beliefs and disbeliefs about medicine and pharmacological treatments; and

  • active promotion of physical health strategies by specifically targeting occupational, physical, and vitality impairments in CP.

This would potentially lead to improved quality of life in CP individuals and to a shorter, more transient, less burdensome pain experience.