FormalPara Key Summary Points

Postherpetic neuralgia (PHN) is a kind of intractable pain.

Studies have shown that early pain intervention can reduce the incidence and severity of PHN.

There are clear risk factors associated with PHN. Can we predict the probability of PHN in a patient with shingles by analyzing risk factors?

A statistical model for predicting PHN was obtained through machine learning by logistic regression and random forest analysis.

For patients at high risk of PHN, we can advise them to undergo pain intervention as soon as possible.

Digital Features

This article is published with digital features to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.12866024.

Introduction

Postherpetic neuralgia (PHN) is a kind of neuropathic pain secondary to herpes zoster infection. In the past, PHN was defined as pain in the herpes area that persists for > 3 months after rash healing [1]. However, many clinicians have realized that this definition may delay the treatment of PHN, so the definition has been revised with a further distinction: pain present within 1 month from the onset of rash is defined as acute herpetic neuralgia; pain present between 1–3 months is defined as subacute herpetic neuralgia; pain persisting > 90 days from the onset of herpes zoster is defined as PHN [2]. This is currently the most widely accepted definition. In this study, we adopted this kind of classification to define PHN. However, some clinicians believe that this diagnosis is not sensitive enough to recognize PHN, and they use more aggressive diagnostic criteria in which persistent pain 1 month after rash healing can be considered PHN, so that interventions can be used to control pain earlier [3]. Studies have shown that early pain intervention such as epidural and paravertebral block, stellate ganglion block and percutaneous electrical nerve stimulation can prevent PHN [4], so being able to predict PHN in a patient with acute herpetic neuralgia would be helpful to both patients and clinicians.

Many studies have revealed the risk factors associated with PHN, including aging, acute pain intensity, underlying diseases, antiviral treatment or not, immunosuppressive state, etc. [5, 6]. Considering any single factor, elderly people seem to be more susceptible to PHN, and patients with more severe acute pain are more likely to develop neuralia sequelae [7]. Herpes zoster patients with underlying diseases, especially cancer and diabetes, are more prone to PHN [8, 9], which is related to the immunosuppression caused by radiochemotherapy and the abnormal immune state of patients themselves [8]. Since immunosuppression may be related to the occurrence of PHN, some clinicians believe that the use of glucocorticoids in the treatment of acute herpes zoster can also increase the incidence of PHN [10]. Although we can determine which factors may be associated with the occurrence of PHN, we cannot tell for sure if a patient with acute herpes zoster will develop PHN or not. In this study, we are trying to use machine learning to construct a statistical model to predict PHN.

Methods

Data Collection

We reviewed 502 outpatients, inpatients, and online patients with a history of herpes zoster in the pain clinic of China-Japan Friendship Hospital since 2017. Extracting some necessary traceable features as statistical data, what sort of data we needed was based on the literature reviews [5, 6, 11] and our clinical experience and included gender, age, numeric rating scale (NRS) score, rash site, Charlson Comorbidity Index (CCI) score, antiviral treatment and immunosuppression. Age refers to the age at the time of rash onset, and the NRS score is used to describe the intensity of acute herpetic neuralgia, which includes 11 integers between 0 and 10; the larger the number is, the more severe the pain. The CCI score is one of the most widely used comorbidity scoring systems and is based on 19 underlying diseases including myocardial infarction, chronic obstructive pulmonary disease, leukemia, and so on. Higher scores indicate more severe comorbidities [12]. Immunosuppression here refers to patients with HIV, leukemia or lymphoma in the previous 2 years, myeloma or other unspecified cellular immune deficiencies ever or taking high-dose oral corticosteroids during acute neuralgia. The protocol was approved by the ethics committee of China-Japan Friendship Hospital, Chaoyang District, Beijing. The patient consent was exempted because there was no breach of patients' privacy, no risk to patient safety, no collection of biologic samples and no violation of patients’ rights.

Data Processing

To make data easy to calculate, we did the following processing: in terms of gender, we marked “1” for male and “2” for female; we divided different ages into four layers, which were represented by consecutive numbers. Less than or equal to 20 years old was recorded as “1,” 21–40 years old was recorded as “2,” 41–60 years old was recorded as “3,” and > 60 years old was recorded as “4.” As for rash site, we recorded the head and face as “1,” the upper limbs as “2,” the trunk as “3” and the lower limbs and perineum as “4.” For the remaining features, we denoted “no” as “1” and “yes” as “2.” To describe whether the patient had PHN, we marked the negative as “0” and the positive as “1” according to general practices [13].

Feature Selection

In feature selection, we analyzed the seven characteristics by univariate analysis to determine the risk factors related to the occurrence of PHN. To avoid missing some risk factors, we incorporated characteristics that show a correlation with p < 0.10 rather than p < 0.05 into the following machine learning.

Machine Learning

Given that the prediction of PHN is a binary classification problem, it is appropriate to use logistic regression for machine learning [13]. The mathematical expression of logical regression in machine learning is: p = 1 / (1 + e−t), t = b + w1x1 + w2x2 + … wnxn. p is the probability of getting PHN; xn refers to the value of each characteristic; wn and b are the coefficients and interception that were calculated through logistic regression. We can calculate the probability of PHN in a person with shingles by assigning values to each characteristic. If the probability is > 0.5, it is classified to be PHN positive.

In addition, we used random forest algorithm to compare with logistic regression. Random forest is an integrated algorithm in machine learning, which has been applied in medical research in recent years, e.g., in the diagnosis of acute appendicitis [14], classification of pulmonary nodules [15] and prediction of Alzheimer's disease [16]. Not only can it be trained to predict new samples, but it can also calculate the importance of each risk factor to PHN. We compared the two models by the accuracy and area under the curve (AUC) value. Both logistic regression and random forest were implemented using the scikit-learn library under Python 3.8. We set “random state” = 10; in random forest classification we set “n_estimator” = 100, “max_depth” = auto and “min_samples_leaf” = 1.

Model Testing

Machine learning can predict the past cases through the existing data. To verify its predictability to the unknown cases, we conducted a small test in the next 60 newly diagnosed patients with herpes zoster, and we used the superior model between logistic regression and random forest to predict whether the patients would develop PHN and then followed them for 3 months for confirmation.

Results

Basic Information

A total of 502 patients with herpes zoster were reviewed, 125 of whom had PHN, with an incidence rate of 24.90%. The prevalence was 23.63% in males and 26.04% in females, with no significant difference (p = 0.44). The basic information for each relevant factor is shown in Table 1.

Table 1 Basic information

Feature Selection

Through univariate analysis, we found that gender has no relation with the occurrence of PHN (p = 0.60). Age, NRS score, rash site, CCI score, antiviral therapy and immunosuppression were related to PHN with p < 0.10; these six factors will be applied to machine learning.

Logistic Regression

In logistic regression, except for “rash site,” which the coefficient is negative, the coefficients of the other five risk factors are all positive. The percent of cases correctly classified was 92.83% with an AUC value of 0.98 (95% CI 0.96–0.99). The results of logistic regression are shown in Table 2.

Table 2 Result of logistic regression

Random Forest

By means of random forest classification, we can intuitively see the influence of each feature on the results, the importance of age, rash site, NRS score, CCI score, antiviral therapy and immunosuppression to PHN was calculated as 0.10, 0.13, 0.31, 0.24, 0.01 and 0.21, respectively, as shown in Fig. 1. The prediction accuracy was 96.24%.

Fig. 1
figure 1

Importance of each factor

Comparison Between Logistic Regression and Random Forest

The comparison of the prediction results between logistic regression and random forest is shown in Table 3. It can be seen that the random forest model is superior to the logistic regression model in the results of each evaluation index (p = 0.03).

Table 3 Comparison between random forest and logistic regression

Test Result

We predicted the next 60 patients who were initially diagnosed with herpes zoster. PHN occurred in 19 out of 60 patients, and the incidence rate was 31.67%. Positive results were correctly predicted in 17 cases. The sensitivity was 89.47% (95% CI 66.86–98.70%), and the specificity was 87.80% (95% CI 73.80–95.92%). The total accuracy was 88.33% (95% CI 77.43–95.18%). The results are shown in Table 4.

Table 4 Predicted results and real outcomes

Discussion

Meaning of the Prediction Model

Many studies have analyzed the risk factors associated with PHN using epidemiologic surveys. These studies help us to understand what characteristics make patients with shingles more susceptible to PHN. In this study, by analyzing previous cases and extracting relevant data, a statistical model using a machine learning algorithm to predict the probability of PHN in patients with herpes zoster was established. The coefficients of logistic regression can be used to calculate the probability of PHN, and the prediction formula obtained by logistic regression can be expressed as: p = 1 / (1 + e−t), t = 28.91 + 1.49x1 + 3.34x2 + 0.63 x31.45 x4 + 1.75 x5 + 1.79 x6, x1x6 referring to age, NRS score, CCI score, rash site, antiviral therapy and immunosuppression. For example, a 70-year-old man with a history of lymphoma gets herpes zoster in the upper limb. The NRS score is 6, and he receives full antiviral treatment; the probability of getting PHN is 0.96. The logistic regression model is relatively simple and interpretable compared to random forest. However, in this study, the prediction accuracy is lower than that of random forest. Logistic regression is suitable for dealing with independent risk factors, but there are connections between age and CCI score, CCI score and immunosuppression. Older people may have more comorbidities and be prone to immunosuppression. Random forest is a powerful and integrated algorithm. In addition to the high prediction accuracy, we can see the importance of each risk factor for PHN. The two most important factors were "NRS score" and “immunosuppression,” which are consistent with previous literature reviews and our clinical observations. The CCI score is less important, perhaps because part of its importance is borne by immunosuppression. For example, some patients with high CCI scores may be due to chronic heart disease (CHD) or chronic kidney disease (CKD) rather than leukemia or lymphoma; however, CHD and CKD did not cause immunosuppression and had little relationship with PHN, so the CCI score is not as important as immunosuppression. Among all the related factors, antiviral therapy is the least important with the importance of 0.01, but this does not mean that antiviral treatment is not important for PHN prevention. The possible reason is that most of the patients with herpes zoster have received antiviral treatment regardless of whether they will get PHN. This model is a tool for identifying patients at high risk of PHN; patients with high probability of PHN can receive interventional or non-interventional treatment of pain during acute neuralgia to reduce the incidence and intensity of PHN.

Risk Factors for PHN

In addition to the risk factors mentioned in this study, some studies have shown that the area of the skin lesion and the emotional state of patients are also related to the occurrence of PHN [17, 18]; some studies even claim that lack of sleep is a risk factor for PHN [19]. As can be seen in Table 2, the coefficient of “rash site” is negative, indicating a negative correlation between its value and PHN, i.e., the closer the herpes is to the head, the easier it is to get PHN. This may be because there is a denser neural network in the head and upper limbs, and more neurons are damaged by the varicella zoster virus (VZV). Other factors are positively correlated with PHN. We can see that people > 60 years old have a greater chance of developing PHN from shingles. One viewpoint is that the immune function of elderly patients declines, so VZV replication is active, resulting in severe nerve damage, and elderly people's nervous system is less able to repair the damage, leading to susceptibility to PHN. For a similar reason, patients with tumors, diabetes and immunosuppressive states are more likely to get PHN.

Why Should We Pay Attention to PHN?

PHN and herpes zoster are two completely different diseases. Most patients, especially young people, heal with few sequelae. PHN is a typical kind of neuropathic pain with hyperpathia and allodynia. Patients with PHN report decreased quality of life and interference with activities of daily living that may affect physical, psychologic and social aspects of their lives as well as their ability of function [20]. What's worse, the treatment of PHN is limited, and the clinical treatment effect is hardly satisfactory. Therefore, how to control the occurrence of PHN is a very challenging topic; some studies have shown that vaccines may reduce the incidence of PHN [21]. Some studies suggest that early interventions, such as continuous epidural block [22], stellate ganglion block [23] and subcutaneous injection of triamcinolone and lidocaine [24], can help to prevent PHN. Therefore, to control PHN, early detection and early intervention can make a big difference. If one has a high risk of PHN, we can recommend early interventions at the time of the first visit to avoid the occurrence of PHN or to reduce the intensity of PHN.

Limitations and Deficiencies

This is a retrospective study that has certain limitations. The first is that the sample size included in the study is not large enough. With more sample data, the likelihood to reflect the real situations is greater. The second is that the feature data collected may not be comprehensive. A growing number of related risk factors has been found to be associated with PHN, for example, the skin lesion area. The data we collected were mainly based on clinical characteristics and lacked some objective laboratory examination indicators such as blood biochemistry and imaging tests. Studies have shown that patients with PHN do have neurologic imaging changes [25]; only by collecting as many relevant factors as possible can we get the model that is closest to reality. The third point is that a single database does not necessarily reflect all cases. The characteristics of cases vary from region to region. For example, in China, the prevalence rate of PHN in patients with herpes zoster is 29.78% [26], and in Europe, it is 5.82% [6]. Therefore, clinicians can set up their own database according to their actual situation. In addition, pain was the chief complaint in all the included cases. Some patients with shingles did not have pain, and these patients were not included in the study, although this was uncommon.

Conclusions

Early pain intervention is an important way to prevent PHN, so determining what kind of patients are prone to PHN is equally important. We can predict the probability of PHN in patients with shingles through machine learning. The most commonly used machine learning methods are logistic regression and random forest, each with its advantages and disadvantages. Through comprehensive learning from the previous data, we can predict the unknown cases.