Background

At the end of 2019, COVID-19 spread globally. In March 2020, WHO declared it a pandemic [1], which has led to significant years of life loss [2].and excess mortality [3]. Smoking is a closely related factor to COVID-19. On the one hand, smoking has been shown to upregulate ACE2 expression, increasing susceptibility to COVID-19 [4]. On the other hand, COVID-19 severity is significantly higher in smokers compared to non-smokers [5]. Therefore, it is necessary to reduce smoking behavior to promote health during the COVID-19 outbreak.

Some previous studies have reported changes in smoking behavior during the COVID-19 pandemic and identified influencing factors. Some studies suggest that smoking behavior has decreased during the pandemic due to concerns about the perceived harm of smoking during COVID-19 [6], difficulties purchasing cigarettes due to pandemic-related lockdowns, and the inability to smoke in public places due to mask-wearing requirements [7]. However, other studies have indicated a significant increase in smoking behavior during COVID-19 due to anxiety, depression, stress, and other factors [8, 9]. The multitude of factors influencing smoking during COVID-19 necessitates identifying high-risk populations and targeting the most significant influencing factors to reduce smoking behavior. However, none of these studies have investigated the primary influencing factors and high-risk populations for smoking behavior during the COVID-19 pandemic.

Classification and Regression Tree Analysis (CART) is a decision tree method developed by Breiman and colleagues. Using CART, it is possible to identify the most significant influencing factors for relative risk and explore the interaction between influencing factors and the most critical influencing factors to form the branches of the classification and regression tree, dividing the population into high-risk subgroups [10]. It is a nonparametric program that begins tree development by examining all predictor variables and selecting the variable (parent node) that can best predict the desired classification. The data in this parent node is divided into two classifications (child nodes): one predicts the response variable classification, and the other does not. This binary recursive splitting process is repeated for each child node until further splitting is no longer possible [11]. Over the years, as CART has developed, it has been increasingly used in the medical field [12,13,14,15], and in the smoking field, it is mainly used to identify high-risk populations for the use of tobacco substitutes [16], the combination of risk factors for smoking and the strongest predictive indicators [17], as well as the prediction of smoking cessation outcomes [18].

The present study

Overall, smoking is a risk factor for COVID-19 infection and severity. Prior studies have analyzed the influencing factors of smoking behavior during the pandemic, but these studies only explored the relationship between influencing factors and changes in smoking behavior. This study aims to address the limitations of these studies. Specifically, CART analysis was used to explore the factors that most deeply influence smoking behavior in the population and to analyze the interactions between this factor and other influencing factors to identify high-risk populations for increased smoking behavior.

Method

Data and procedure

The data used in this study is conducted in 23 provinces, 5 autonomous regions, and 4 municipalities directly under the central government from June 20, 2022, to August 31, 2022. In this time, China was still experiencing the peak of the COVID-19 pandemic, with an increase of 442 − 77,402 cases per day [19]. During the investigation, China implemented a dynamic “Zero-COVID” policy, taking prompt actions to contain the outbreak of COVID-19 in the local area [20]. The specific measures include medically lockdown those who have had close contact with confirmed cases; large-scale nucleic acid testing; citywide home quarantine; the use of electronic health codes when entering public places; travel restrictions; and advocating for mask-wearing in public spaces. In certain situations, staff will remind individuals to wear masks or they will be prohibited from entering. During the policy implementation, China rigorously enforced the policy, and the policy was well implemented [21, 22].

The survey used Equal-probability sampling and non-equal-probability sampling. Equal-probability sampling (stratified sampling) at the provincial, municipal, district, township/subdistrict, and community/village levels (stratified sampling) and non-equal-probability sampling (quota sampling) at the community/village to individual level At least one surveyor or a panel of surveyors were recruited in each city. Investigators set up questionnaire points at health service centers or relevant health service stations in the sampling communities under their responsibility to conduct face-to-face surveys. And if they cannot conduct them due to the epidemic, the user uses the Online Questionnaire Star platform (https://www.wjx.cn/) to distribute the electronic questionnaire to each person to collect data. All participants obtained the participating respondent and record the questionnaire number issued to that person. Subjects were included in the study if they were ≥ 12 years old, provided written informed consent, and volunteered to participate in the study. A total of 23,414 questionnaires were collected, and after identifying and removing duplicate values, missing values and outliers with logical problems, a final sample of 21,916 was obtained, with a valid response rate of 93.6%.

Variables

Characteristic variable

The characteristic variable in this study included respondents’ basic information (age stage, gender, education, chronic and current work status), family characteristics (family income) and personal health status (chronic), COVID-19 related (COVID-19 impact of lockdown on livelihoods, lockdown), Negative events, Smoking status (length of smoking), exposure to secondhand smoke (acceptation degree of passive smoking, acquaintance smoking, smoker smoked around, and stay in smoking area). See Supplementary Table S1 for details of definitions and classifications.

Self-efficacy

Self-efficacy was measured using the New General Self-Efficacy Scale short form (NGSES-SF3) [23]. The scale consists of 3 items, with a total score ranging from 0 to 12 points. See Table S2 in the Supplementary Material for details. In this study, the Cronbach coefficient for the NGSES-SF3 was 0.925.

Depression

Depression was measured using the Patient Health Questionnaire-9 (PHQ-9) [24]. It is a nine-item self-report scale developed to assess symptoms of depression. The items were rated on a scale of 0–3 (not done = 0), and the total scale score ranges from 0 to 27. Symptom severity can be illustrated through the total score, where 0–4 points are without depression; 5–9 points for mild depression; 10–14 points for moderate depression; 15–19 points for more severe depression; 20–27 points for severe depression. See Table S3 in the Supplementary Material for details.

Anxiety

Anxiety was measured using the Generalized Anxiety Disorder Questionnaire (GAD-7) developed by Robert L Spitzer [25]. The scale consists of seven items with Table S4 good reliability, as well as the validity of criteria, constructs, factors, and procedures. The cut-off point for optimal sensitivity (89%) and specificity (82%) was identified. See in the Supplementary Material for details.

Perceived social support

Social support was measured using the Perceived Social Support Scale short form (PSSS-SF3) based on the Zimet Perceived Social Support Scale [21]. A 3-item scale is divided into three dimensions: family support, friend support, and other supports, as shown in Table S5 in the Supplementary Material. These three items were rated on a scale of 1–7 (Strongly disagree = 1), with higher scores indicating a greater perception of social support. In this study, the Cronbach coefficient for the PSSS-SF3 was 0.943.

Statistical analysis

First, we used EmpowerStats for descriptive analysis. Continuous variables were presented as mean ± standard deviation and categorical variables were reported as frequency N (%) and assessed using t-tests for continuous variables and chi-square tests for categorical variables. Next, we conducted univariate and multivariate logistic regression using Stata version 16.0 to examine the relationships between each potential predictor (demographic characteristics, perceived social support, depression, anxiety, and self-efficacy) and smoking outcome. Than, based on the results of the multivariate logistic regression, variables related to smoking behavior were included and a CART analysis was performed using R to identify high-risk populations for increased smoking behavior during COVID-19 and the factors that most deeply influenced the increase in smoking behavior.

Finally, we used R to evaluate the performance of CART. We calculated sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of the CART model. In this study, sensitivity refers to the probability of correctly predicting smokers as smokers. Specificity refers to the probability of not predicting non-smokers as smokers. Positive predictive value represents the proportion of true smokers among the sample units predicted as smokers. Negative predictive value represents the proportion of true non-smokers among the sample units predicted as non-smokers. Accuracy refers to the proportion of correctly categorized smokers and non-smokers out of the total.

Results

Sample characteristics

A total of 21,916 valid data were collected. Table 1 showed the demographic, self-efficacy, anxiety, and perceived social support characteristics of the smoking and non-smoking population (P < 0.01). Compared to non-smokers, smokers are more anxious, more depressed, less accepting of secondhand smoke, experience more negative events, have more acquaintances smoking in front of them, perceive less social support, and have lower self-efficacy (P < 0.001). Both groups were predominantly aged 18–59 years, without chronic disease and COVID-19 lockdown measures (P < 0.01) (Table 1).

Table 1 Descriptive analysis of sample characteristics on smokers and un-smokers. (mean ± SD)

Univariate logistic regression analysis and multivariate logistic regression analysis

Univariate regression analysis showed that the COVID-19 Impact of lockdown on Livelihoods was not associated with smoking behavior during COVID-19 (P = 0.246) (Table 2). Multivariate logistic regression analysis showed that having a chronic disease, higher perceived social support, lower self-efficacy, 31–40 years of smoking, absence of acquaintance smoking in front of them, staying in the smoking area, and lower acceptance of secondhand smoke were associated with the rise of smoking behavior during COVID-19 (P < 0.05) (Table 3).

Table 2 Univariate logistic regression analysis of smoking behavior change in COVID-19
Table 3 Multivariate logistic regression analysis of smoking behavior change in COVID-19

Classification and regression tree (CART) analysis

The CART analysis (Fig. 1) used a sample of smoking individuals and identified attitudes toward secondhand smoke as the strongest predictor. The 100% in the parent node of Fig. 1 represents the entire smoking population in this study, while the 0.15 represents the 15% of the overall population included in the study. The first branch divided the smoking population into those with an acceptation degree of passive smoking ≥ 12 (76%) and those with a degree < 12 (24%). The branch for those with a degree ≥ 12 and with no smoker smoked around (72%) led to a subgroup with a length of smoking of 30 years or more, accounting for 28% of the total smoking population. The branch for those with a score < 12 were non-chronic disease patients.

The branch for those with an acceptation degree of passive smoking of ≥ 12 indicates that no smokers smoked around. Among those with an acceptation degree of passive smoking of ≥ 12, have no smokers smoked around, and a length of smoking of ≥ 30 years, a subgroup leads to a terminal node, accounting for 28% of the total smoking population. This branching process is repeated until the sample is classified into 15 risk profiles (bottom row of Fig. 1). Currently, the subgroup with a high acceptation degree of passive smoking, have no smokers smoked around, and a length of smoking of ≥ 30 years is identified as the highest smoking risk (34%).

Fig. 1
figure 1

Classification and regression tree analysis of factors influencing smoking behavior

Performance evaluation of classification and regression tree (CART)

In this study, the CART model demonstrated high specificity (99%), high positive predictive value (71%), high negative predictive value (88%), and a high accuracy rate (87%). But sensitivity (20%) is low, it may be due to category imbalance that large sample size gap between smokers and non-smokers in this study (Table 4).

Table 4 Performance of classification and regression tree (CART)

Discussion

Smoking is a closely related factor to COVID-19, and controlling smoking behavior is of great significance for the prevention and treatment of COVID-19. Identifying high-risk subgroups for smoking during the pandemic can enable targeted prevention and effective reduction in smoking behavior. Therefore, this study used CART analysis to identify high-risk subgroups for smoking behavior during COVID-19 and determine the factors that have the deepest influence.

Lockdown is an important factor influencing smoking behavior during COVID-19. It refers to the measures taken by various countries to prevent the spread of COVID-19, such as home quarantine, closure of entertainment venues, and isolation and quarantine measures [26]. Due to the differences in social background and lockdown measures, the impact of lockdown on smoking behavior may also vary [26]. In this study, lockdown mainly refers to three measures: home quarantine, activities within the community, and activities within the city. Previous studies have shown that during lockdown, stress and depression increase, and people tend to smoke more frequently, while the number of people attempting to quit smoking decreases [27]. However, the results of this study suggest that lockdown is associated with a decrease in smoking behavior. This may be due to the inability to purchase cigarettes during the home quarantine period [7] and an increase in motivation to quit smoking due to an increased perception of the harm of COVID-19 [28].

According to the CART model, currently, the subgroup with a high acceptation degree of passive smoking, have no smoker smoked around them, and a length of smoked of 30 years or more has the highest smoking rate during the COVID-19 pandemic. The acceptation degree of passive smoking is the main determinant of smoking behavior during the pandemic. This may be because people are more attentive to personal health protection during the COVID-19 pandemic and are more sensitive to the perceived harmfulness of tobacco [29], which may lead to a lower acceptance of secondhand smoke [30], resulting in a reduction in smoking behavior.

According to the CART model, during the COVID-19 pandemic, people are more likely to smoke when they are in the presence of have no smoker smoked around, which is contrary to previous research results. Previous studies have shown that individuals are more likely to start smoking when family and friends around them smoke [31, 32]. This may be due to an increase in personal protection awareness during the COVID-19 pandemic. As COVID-19 primarily affects the respiratory system, wearing a mask is an important preventive measure against COVID-19 [33]. During the period of this study, China was still experiencing the peak of the COVID-19 pandemic [19]. Despite the presence of individual variations, due to the Chinese government’s advocacy for mask usage and the concurrent increase in public health awareness among the population, there is a high level of acceptance and compliance with mask-wearing during the COVID-19 pandemic [34]. Even in 2023, when the COVID-19 pandemic has largely subsided, residents continue to exhibit good mask-wearing habits [35]. When people remove their masks to smoke, others may become more attentive to wearing masks due to fear of contracting COVID-19. Thus, when people are not smoking around them, individuals may be more likely to smoke. This conclusion needs to be verified in other countries. This result is opposite to our Logistic regression results, which may be due to CART examining the interaction between variables, which is why CART is widely used in exploring risk factors [36, 37]. Additionally, due to nicotine dependence, those with a longer smoking history have stronger nicotine dependence and more severe withdrawal symptoms, making it harder for them to reduce smoking behavior [38]. Therefore, our study shows that individuals with a length of smoking of 30 years or more are more likely to smoke during the COVID-19 pandemic. The group with a length of smoking of 40 years or more is not significant in the multiple regression results but is included in the CART model. There are two possible reasons for this. On the one hand, CART has greater resistance to multicollinearity compared to other parametric methods [36, 37]. On the other hand, CART is a decision tree model that only considers which variables can better predict the increase in smoking behavior during the COVID-19 pandemic and form the best classification, without considering variable significance issues.

Having a chronic illness is also a significant predictor, as non-chronically ill individuals are more likely to smoke. Smoking is strongly associated with chronic diseases [39], and China’s disease spectrum has shifted towards chronic, non-communicable diseases [40]. Additionally, chronic illness patients have a higher severe disease rate after contracting COVID-19 [41, 42]. To reduce the harm of chronic diseases, doctors are more likely to advise chronic illness patients to quit smoking, and patients are also more likely to accept smoking cessation advice from doctors [43, 44].

Regarding these issues, first, more attention should be paid to long-term smokers, and more specialized smoking cessation help should be provided to them. For example, the smoking cessation clinic actively promoted in China is an effective method [45]. Secondly, for individuals who are more exposed to secondhand smoke, tobacco education should be strengthened to enhance awareness of the hazards of secondhand smoke. Moreover, due to the requirement to wear masks in public areas during the COVID-19 period, smoking behavior has also been reduced. Therefore, during the COVID-19 pandemic, the “mask protection” effect can be fully utilized to guide smoking cessation behavior. Even if individuals around them are not smoking, environmental smoke may still carry and spread the virus, so it is necessary to wear masks and avoid smoking. Finally, doctors and non-chronically ill patients should also raise awareness of smoking cessation. Tobacco causes great harm to human health, and doctors’ smoking cessation advice is feasible in promoting patient smoking cessation [46], making doctors an important candidate in promoting smoking cessation, and doctors should also actively provide smoking cessation help to non-chronically ill patients.

Finally, this study explored the high-risk groups for smoking, and future studies should also delve deeper into the triggers for smoking cessation to provide a guiding direction for tobacco control policies and to form a continuity study to enrich policy guidelines.

Strength and limitation

Our study conducted a national survey using quota sampling, which can balance differences between regions and reflect the situation nationwide. Secondly, we focused on smoking behavior during the COVID-19 period and comprehensively analyzed the factors that influence smoking behavior in the context of epidemic prevention and control. Finally, our study results further revealed the mutual interactions between the most important risk factors and other influencing factors, thus identifying the high-risk group for smoking during the COVID-19 period.

However, this study also has some limitations. First, the study is a cross-sectional survey and does not establish causal relationships. Second, there may be other risk factors that affect smoking behavior during the COVID-19 period that were not included in this study. Finally, the sensitivity of CART in this study was relatively low, probably because of the small number of smokers in this study, which was large sample size gap between smokers and non-smokers. But even so, the accuracy of CART was high.

Conclusion

In general, this study was based on a national sample and used CART analysis to explore the high-risk population for increased smoking behavior during the COVID-19 period. The results showed that people with a high acceptation degree of passive smoking, have no smokers smoked around, and a length of smoking of ≥ 30 years were the subgroups with the highest smoking behavior during the COVID-19 period. Acceptation degree of passive smoking was the strongest predictor of smoking behavior during the COVID-19 period. It is important to pay more attention to long-term smokers and non-chronic disease patients, raise awareness of the hazards of smoking and secondhand smoke, and take advantage of the “mask effect” during the epidemic period to reduce smoking behavior during the COVID-19 pandemic.