Introduction

At the end of 2019, a severe outbreak of COVID-19 occurred in China. This outbreak led to many factory shutdowns and school closures, and people were required to stay at home until the outbreak was effectively contained. This situation had an impact on all aspects of people's lives, including individual physical and mental health (First et al., 2020; Zhao, 2020). Specifically, people are more likely to become angry, feel stressed, and even develop psychological problems such as anxiety and depression (Bruno et al., 2020; Brennan et al., 2020; First et al., 2020; McKay and Asmundson, 2020; Torales et al., 2020). One study of 194 cities in China found that 54% of respondents rated the psychological impact of the COVID-19 outbreak as moderate or severe; 29% of respondents reported moderate to severe anxiety symptoms; and 17% of respondents reported moderate to severe depressive symptoms (Wang et al., 2020a; Esra et al. 2021). Research focused on Twitter showed that abusive content generated on social media increased significantly following COVID-19, with violence-related content being one of the fastest-growing topics (Babvey et al., 2020). Therefore, people's mental health has become an issue that cannot be ignored in the context of COVID-19.

Effects of exercise on mental health

Exercise has received increasing attention from scholars in all stages of life as a mental health intervention due to its convenience and low cost (Coyle et al., 2020a, b; Wang et al., 2020a). Research has shown that exercise can relieve stress (Zhang et al., 2020b), reduce anxiety levels (Cho and Kim 2020), and decrease the breadth and depth of depression (Zhao and Shi 2016). In addition, exercise can significantly improve sleep disorders and factors related to mental disability in individuals (Zhao and Shi 2016). It can also enhance people's happiness by improving their life satisfaction (Nezlek et al., 2018). It is important to note that the intensity of exercise should be acceptable to the individual. An exercise frequency of three to five times per week could improve mental health (Hu et al., 2020). Therefore, exercise is vital for improving people’s mental health during the COVID-19 epidemic. However, in previous studies on exercise and mental health, researchers have mainly employed traditional research instruments such as questionnaires, interviews, and experiments. These methods are difficult to implement successfully during the COVID-19 epidemic.

Weibo data

Following the advent of the big data era and the rapid development of social media, Twitter and Sina Weibo have developed into online media that allow people to record their daily lives, share information and interact with their friends (Gosling et al., 2011). Sina Weibo is currently the most popular social media platform in China (Li et al., 2014). Using Sina Weibo, we can quickly access a large amount of rich textual material and real-time user data. Based on Sina Weibo, researchers from the Chinese Academy of Sciences (CAS) successfully predicted individuals' personality traits (Bai et al., 2014; Li et al., 2014; Wang et al., 2020a, b) and assessed their mental health statuses, such as depression and subjective well-being (Hao et al., 2014). In addition, they developed Online Ecological Recognition (OER), a system that can use the output of a predictive model to achieve moderate correlation with questionnaire scores (Liu et al., 2018). In summary, it seems that it is possible to detect users’ levels of exercise and daily mental states using Weibo data (Young et al., 2014). Accordingly, we propose Hypothesis 1: Weibo text data reflects individual exercise behavior.

Nonphysical exercise

In addition, according to previous research, physical education behavior refers to people's movements and activities, including sports, sports consumption, sports time and space use, and sports performance (Jin et al., 2001). Text from Weibo contains not only information pertaining to individual sports but also discussion of sports-related topics. We classified exercise behaviors into physical exercise behaviors and nonphysical exercise behaviors based on the degree of body involvement. Specifically, the former refers to exercise behaviors that require individuals' physical participation, such as running and swimming, while the latter refers to exercise behaviors that do not require individuals' physical participation, such as watching sports events, browsing sports news, and participating in sports-related discussions. It has been shown that the characteristics of sports media messages (i.e., lighthearted, stimulating, and fun) satisfy people's needs for entertainment, emotional stimulation, and anxiety relief (Sun and Zhang, 2016) and that sports media (e.g., sports events, videos, and news) can increase well-being in terms of enhancing residents' awareness of fitness, enthusiasm for exercise, proper values, cohesion, and life satisfaction (Zhou, 2016). Accordingly, we propose Hypothesis 2: During the epidemic, the mental health indicators the nonphysical group of exercise users compared to the nonexercise group.

Hypotheses and research overview

In summary, this study aimed to explore the differences in the features of psychological and behavioral vocabulary expression and mental health indicators among users who exhibit different exercise behaviors during the COVID-19 epidemic using text data from Weibo with the aim of providing scientific online measurement tools and support for research related to exercise and mental health during the COVID-19 epidemic. To test these hypotheses, we designed two studies and the specific study flow is shown in Fig. 1.

  1. (1)

    Study 1 was used to test Hypothesis 1. Specifically, we first searched for two categories of sports behavior-related vocabulary in the Weibo data material based on two types of sports behavior (physical sports behavior and nonphysical sports behavior); subsequently, based on these two categories, we constructed a Weibo exercise behavior user vocabulary and an exercise behavior classification program.

  2. (2)

    Study 2 was used to test Hypothesis 2. Specifically, we first used an exercise behavior classification program to group Weibo users into certain exercise behavior categories. Subsequently, we analyzed the differences in the features of verbal expression and mental health indicators exhibited by members of different exercise behavior groups to explore the effects of exercise behavior on psychological traits.

Fig. 1
figure 1

Research Flow Chart

(Supplement: We sorted users who engaged in physical exercise behaviors at least twice per week into the physical exercise group (Dong & Mao, 2018), users who engaged in nonphysical exercise behaviors at least once per week were categorized as part of the nonphysical exercise behavior group, and the remaining users were included in the nonexercise group).

Study 1: Construction of a Weibo exercise behavior user dictionary and classification program

Methods

Construction of a Weibo exercise behavior user dictionary

Vocabulary source. Using an open API port provided by Sina Weibo, this study used a Python Spider program (a program that automatically crawls website content) to crawl keywords and hot topics related to mainstream sports and fitness, namely, basketball, soccer, badminton, tennis, volleyball, table tennis, swimming, running, and fitness, by reference to a sample of approximately 100,000 noncertified Weibo users (regular users) during January 2020-October 2020, across 30,000 blog post entries for each exercise program. By filtering keywords and topics, users who participated in, watched or discussed related sports could be targeted quickly. Figure 2 reports a blog post example featuring the keyword "Sports is Persistence".

Fig. 2
figure 2

#Sports is Persistence #topic blog content

(Supplement 1: The tweets extracted for this study were obtained by a graduate student working as part of this research group, but the extraction of the tweet data was largely independent of the data collector, and results concerning tweet data are consistent as long as the same procedure is used.

Supplement 2: The eight sports included in this study were determined based on a survey of the Chinese Sports Venue Report (2019) conducted by the General Administration of Sport of China. The survey focused on the construction of venues for the eight types of sports listed above, which indicated that these sports were easily accessible to ordinary people in China; thus, the eight sports chosen were appropriate for this study.)

Database of sport tendency classification

Twenty research assistants directly judged the propensity for movement indicated by keywords, keywords or topics were divided into physical exercise behavior tendency and nonphysical exercise blog tendency categories, and the vocabulary database used in this study was constructed to facilitate subsequent vocabulary extraction. Table 1 below displays an example of classification.

Table 1 Classification of keywords and topics pertaining to sports tendencies

(Supplementary: the research assistant group included 10 undergraduate students majoring in physical education and 10 undergraduate students with other majors, including five assistants of each gender in each group).

Extract vocabulary

We extracted 2,000 microblogs including IDs, times and texts for each exercise topic (basketball, soccer, badminton, tennis, volleyball, table tennis, swimming, running, and fitness, including half for physical exercise tendency and half for nonphysical exercise tendency), and a total of 18,000 blog posts were extracted for the statistical analysis of word frequency. The Jieba Chinese word segmentation tool was used to segment the collected data, although the Jieba word segmentation system could not filter out vocabulary unrelated to sports. Ten research assistants selected a large number of high-frequency words unrelated to sports, such as "we", "today", and "one". Subsequently, the study extracted the first 300 words in each sport category. Table 2 is an example of vocabulary.

Table 2 Examples of Extracted Keywords Pertaining to Sport Events in Weibo (Fitness)

Filter and summarize dictionary

The research team collected the first 300 meaningful words of the physical exercise vocabulary and the nonphysical exercise vocabulary for various sports into two Excel tables and deleted the duplicates, thereby obtaining 433 words for the physical exercise vocabulary and 1076 words for the nonphysical exercise vocabulary, which jointly constituted the Weibo exercise behavior user dictionary.

Construction and verification of the exercise behavior classification model

Exercise behavior classification model

The words included in Weibo posts were colloquial and random, and there was a great deal of ambiguous information contained in microblog posts, so it was difficult to identify sports behavior correctly based only on word matching. As shown in Table 3, this study judged Weibo posts in terms of three aspects, i.e., ① judging priority, ② the vocabulary matching method and ③ negative words, to eliminate the ambiguous information contained in blog posts as much as possible.

Table 3 Examples of sports classifications in blog posts

In this context, ① judgment priority was based on the phrases (physical exercise phrases and nonphysical exercise phrases) included in the microblog text to determine whether the microblog text pertained to physical exercise or nonphysical exercise. For example, if physical exercise words were included in the text, the microblog text was judged to contain physical exercise behavior. This study developed two ways of judging priority, that is, giving priority to physical exercise expressions and giving priority to nonphysical exercise expressions. ② The vocabulary matching method employed either exact matching or fuzzy matching based on the Weibo exercise behavior user dictionary; that is, the former process entailed matching the exact phrase(s) used in a given microblog text to the dictionary, while the latter involved matching all the characters that composed the phrase(s) in question, in a manner related to the expressive characteristics of Chinese. ③ The aspect pertaining to negative words referred to whether negative words were identified, and if a negative word appeared in a certain microblog text, that text was judged to refer to nonphysical exercise.

Verification performance of the classification model

A total of 3000 blog posts were extracted from the physical exercise and nonphysical exercise post databases, and 20 research assistants marked the blog posts. Physical exercise blog posts were marked as 0, nonphysical exercise blog posts were marked as 1 and nonexercise blog posts were marked as 2. When the opinions of group members were different, the principle of “the minority was subordinate to the majority” was adopted. We used artificial marks as a form of calibration to verify the construction of the dictionary classification. Eventually, a total of 8 experimental datasets for motor behavior classification were output according to priority (2 types: priority identification of physical exercise words or nonphysical exercise words), matching patterns (2 types: exact matching or fuzzy matching) and negation (2 types: including negation or not including negation), which can be found in Table 4.

Table 4 Comprehensive evaluation of the classification model

In this study, precision P, recall R and F1-score were used as evaluation indices for the classification model. We tested the effectiveness of the dictionary by calculating the performance of the classification model instead of by measuring the effectiveness of the dictionary directly (Meng et al., 2020). If the classification model based on the Weibo exercise behavior user dictionary exhibited high performance, that would indicate that the exercise behavior user dictionary was effective.

Research results

In the evaluation of classification prediction models in the fields of machine learning and data mining, precision (P), recall (R), and F1-score (F) are commonly used as model evaluation metrics to evaluate the quality of the results (Zhou et al., 2017); specifically, P refers to the ratio of correctly predicted positive data to positive data, and R is the ratio of predicted positive data to actual positive data ((positive data is the exercise behavior microblog). The F1 value combines the precision rate and recall rate; the higher the F1 value is, the better the prediction effect of the classification model (Yang, 2018).

The classification results of the 8 datasets were input into the Python calculation program and the P, recall R and F1 were summarized, thus producing Table 4. From the data, we found that the F1 of models that gave priority to judging nonphysical exercise(Models 1–4) was slightly higher. The addition of negative words did not greatly improve the performance of the model; when negative words were added, P increased slightly, but R and F1 decreased slightly. The matching pattern had a great influence on the performance, and the exact matching had the highest P, in which context Model 3 reached 0.880. The strict standard of exact matching limited the R(0.791); that is, the probability of correctly predicting a campaign blog post was low. However, the R associated with the fuzzy matching model was ideal, i.e., higher than 0.908 in all cases; in particular, the R of Model 6 reached 0.923, indicating that 92.3% of sport blog posts could be correctly classified according to this model.

Overall, the difference between P and R was larger for the exact matching model, with all F1 values being below 0.831, while the difference between P and R was smaller for the fuzzy matching model, with all F1 values being above 0.868. Therefore, for this study, the models combining nonphysical exercises with fuzzy matching were preferentially judged to exhibit better performance, with Model 6 exhibiting the best performance.

Based on the construction of the Weibo sports exercise behavior dictionary, Study 1 developed an exercise behavior classification model that exhibited a high accuracy rate. This outcome verified Hypothesis 1; namely, by constructing an exercise behavior lexicon, this study could identify exercise behaviors in microblog text. This study also indicated the next step of the study, according to which users were grouped according to the associated Weibo content (physical exercise group, nonphysical exercise group and nonexercise group) and the differences between the different groups in each index were tested by outputting the sentiment word features contained in the tweet text using the program. The final exploration of the relationships among sports behavior, mental health and features of linguistic expression.

Study 2: The relationships among sports behavior, mental health and features of linguistic expression

Method

Research samples

The sample was similar to that reported for Study 1. Using an open API port provided by Sina Weibo, the study randomly collected all Weibo blog post data from approximately 100,000 uncertified Weibo users from January 23 to April 7, 2020 via the Python Spider program (a program that automatically crawls the content of websites). Only active users with more than 50 original blog posts during the period were retained, and reposted content, votes, pictures, etc., in Weibo blog posts were deleted.

Research tools

Sports behaviors counting program

Based on the sports behavior classification mode, all blog posts were marked (0 for physical exercise blog posts; 1 for nonphysical exercise blog posts; 2 for nonexercise blog posts), and the users were subsequently grouped by counting the occurrences of the two sports behaviors using the Python program.

TextMind Chinese psychoanalysis system

The Computational Cyber Psychology Lab (CCPL) team of the Chinese Academy of Sciences (CAS) compiled a simplified Chinese Language Inquiry and Word Count (SC-LIWC) dictionary based on the traditional LIWC (a tool to measure psychological characteristics in language) dictionary (which was revised by researchers in Taiwan) (Zhang, 2015; Rui et al., 2013) and added the 5000 most frequently used words in microblogs. Subsequently, based on this dictionary, the researchers of CCPL developed the "Text Mind" system for the psychoanalysis of Chinese texts (Zhao and Shi, 2016), which has been widely used by Chinese scholars for Chinese language analysis. "TextMind" has various functions, such as automatic word segmentation in simplified Chinese and psychological analysis of language. Through the features of linguistic expression related to psychology and behavioral output by the TextMind system, the user's state can be reflected with respect to emotions, modes of behavior, etc. (Zhao, 2020).

Online Ecological Recognition (OER) system

Researchers at the Chinese Academy of Sciences used a machine learning prediction model to automatically identify psychological features and measure psychological indicators in ecological online social media data, which was called the Online Ecological Recognition (OER) system. The predictive model used in OER establishes a relationship between dynamic features and related questionnaire scores using a machine learning algorithm, and the scores predicted by the model are moderately correlated with scores on the questionnaire. This predictive model has been used and confirmed many times (Li et al., 2020, 2021; Liu et al., 2018).

Research procedures

Grouping of Weibo Users

We sorted users who engaged in physical exercise behaviors at least twice per week into the physical exercise group (Dong & Mao, 2018), those who engaged in nonphysical exercise behaviors at least once per week into the nonphysical exercise behavior group, and the remainder of users into the nonexercise group. Ultimately, a total of 7651 users were matched in this manner.

  • Substudy 1: Differences in psychological and behavioral vocabulary features among different users.

    For the data of each user, the study computed 88 dynamic features of LIWC using the LIWC-compatible “TextMind” system. First, the Chinese word splitting tool included in the “TextMind” system divided the user's original Weibo content into several words or phrases using linguistic annotations, such as verbs, nouns, gerunds, and objects; e.g., "I'm not happy today" is split into "I", "today", "no", "happy", and categories of words with psychological meaning are subsequently extracted using the SC-LIWC dictionary. Their feature values were calculated and output, and the ratio of the word frequency of words included in each dimension to the total number of all words is the feature of LIWC.

    All the blog posts were assembled into a text file, and all the text files of users in the physical exercise group, the nonphysical exercise group and the nonexercise group were input into the "TextMind" system in turn. The word separation method chosen was "LTP word splitter". The feature extraction granularity was "per file", and the output format was "csv". The "TextMind" system could output more than 100 kinds of features. Independent variables were the three exercise categories, and the dependent variables were the psychological vocabulary expression features and behavioral vocabulary expression features output by TextMind. Psychological vocabulary expressions included positive and negative emotions, anxiety, anger, and sadness. Behavioral vocabulary expressions included health, death, work, and leisure.

  • Substudy 2: Differences in mental health among different users.

    All blog posts from the three groups of Weibo users were input into the OER system, and the data concerning users' mental health indicators output (automatically) by the OER system were imported into SPSS.26 statistical software for difference analysis. The independent variables were users of three sports categories, and the dependent variables were the six mental health indicators of life satisfaction, hostility, terror, anxiety, depression and stress. See Fig. 3 for the research process.

    Fig. 3
    figure 3

    Flow chart of Substudy 2

Research results

By conducting a one-way ANOVA with respect to the three groups of data, it was revealed that some of the data exhibited uneven variances, and when we subsequently compared this aspect of the data, i.e., conducted post hoc tests, we chose the Dunnett T3 and ultimately obtained the following results.

Basic information of Weibo users

See Table 5 for basic information concerning the three groups of Weibo users. The proportion of male users in the nonphysical exercise group was the highest, reaching 61.9%, while the proportion of male users in the physical exercise group was 35.4%, which was higher than the same proportion in the nonphysical exercise group (26.7%). Male users than female users were included in both the physical and nonphysical exercise behavior groups. This finding was consistent with the conclusions of previous studies, which have found that male students exercise more frequently than female students and that their exercise habits were better than those of female students (Faulkner, 2020; Zheng, 2020). Men pay more attention than women to sporting events and programs and participate in nonphysical exercise activities (Seok, 2014). For users in the physical exercise group, an average of 32.1 physical exercise blog posts were identified. For nonphysical exercise group users, an average of 20.73 nonphysical exercise blog posts were identified. For nonexercise users, the average number of two kinds of sport posts was less than 4.

Table 5 Basic information of each group of users

Differences in the expressive features of psychological and behavioral vocabulary

During the most severe period of the epidemic, there were no significant differences in positive emotional vocabulary features among the groups, but with respect to the four negative psychological vocabulary features of negative emotions, anxiety, anger and sadness, the user feature values of the two sports groups were significantly lower than those of the nonexercise group, and those of users in the physical exercise group were significantly lower than those of users in the nonphysical exercise group. See Table 6 for detailed information.

Table 6 Differences in mental vocabulary expression features

The health feature values of the physical exercise group were significantly higher than those of the nonphysical exercise group and the nonexercise group, but there were no significant differences between the nonphysical exercise group and the nonexercise group. The death feature values of the nonexercise group were significantly higher than those of the two sports groups. Regarding leisure features, the nonphysical exercise group scored significantly higher than the physical exercise group, and the physical exercise group scored significantly higher than the nonexercise group. The working features of the nonphysical exercise group were significantly higher than those of the other two groups. See Table 7 for detailed information.

Table 7 Differences in behavioral vocabulary expression features

Differences in mental health across the 3 sports groups

The Table 8 showed that there were no significant differences between the nonphysical exercise group and the physical exercise group but also that these groups were superior to the nonexercise group in terms of hostility, terror and stress. These findings showed that exercise could reduce individuals' levels of hostility, fear and stress whether or not it was conducted in-person.

Table 8 Differences in mental health indicators across the 3 sports groups

The depression level of the physical exercise group was significantly lower than that of the nonexercise group; that is, physical exercise reduced the level of depression in individuals significantly more than nonphysical exercise. There were no significant differences in life satisfaction and anxiety across the three groups; that is, among the psychological indicators associated with the outputs of the microblog text, physical exercise and nonphysical exercise did not reduce individuals' anxiety levels or improve their life satisfaction. In summary, nonphysical exercise behavior and physical exercise behavior are positively related to certain mental health indicators, which can improve people's levels of mental health during the epidemic. This part of the study verified Hypothesis 2; in the context of the epidemic and based on the data, this study can better verify the relationship between individual exercise behaviors and mental health. That is, physical exercise and nonphysical exercise can improve mental health.

Discussion

Construction of the Weibo exercise behavior user dictionary

The Weibo exercise behavior user dictionary contains 433 physical exercise words and 1076 nonphysical exercise words. The vocabulary included in the dictionary focuses on eight popular sports, which can not only be used to identify the frequency and sports category of physical exercise but can also identify users' behaviors with respect to watching and commenting on sports events and news.

Through the construction of the exercise behavior classification model, the effective classification Model 6 was produced in this study (P = 0.856, R = 0.923, F1 = 0.888). indicating that the microblog user dictionary was effective. Accordingly, Hypothesis 1 was verified; by constructing the exercise behavior dictionary, this study could identify exercise behavior in the microblog text. This conclusion was basically consistent with previous dictionary-based studies concerning social media, such as the study by Xiao et al. (2015), who analyzed the emotions expressed during popular public events by constructing an emotion dictionary and identified the emotional indicators in microblog texts effectively. Zhang et al. (2019), who compiled an emotional dictionary for sudden events, effectively analyzed the emotional changes exhibited by microblog users during popular events. Based on this and previous studies, we posited that the content of variables in big data texts could be identified more effectively if we could accurately describe the constituents that defined the variables and constructed the relevant mental or physical dictionary.

The development of the Weibo exercise behavior user dictionary is helpful with respect to breaking through the limitation of a traditional text analysis dictionary. Use of the Weibo exercise behavior user dictionary and classification program makes it possible to obtain user exercise characteristics from Weibo data more quickly and accurately in future research in the sports field.

Performance of the exercise behavior classification model

The results of Study 2 showed that this study developed a model that exhibited good performance. Specifically, in terms of judging priority, different priorities had a stronger impact on the fuzzy matching model but less impact on the accurate matching model. Because the fuzzy matching model exhibited better performance and a higher F1, we chose to prioritize nonphysical behaviors and fuzzy matching. We found that adding negative words had little effect on the model. Some researchers have highlighted the possible influence of negative adverbs in Chinese semantic analysis (Lv et al., 2015; Zhang et al., 2020a, b), but other research results have shown that considering negative adverbs in isolation has little effect on the predictive model (Lai et al., 2013; Wang et al., 2019). The results of the present study were consistent with this situation (Lai et al. 2013; Wang et al., 2019), so we believe that it may result from the expressive characteristics of Weibo users’ speech, i.e., Weibo users were not used to describing what they did not do. This phenomenon was due to the personal guilt that was triggered when one’s planned shipping was not accomplished.

In addition, the classification effect of fuzzy matching was better than that of accurate matching. Such results have also been reported by previous studies. Kindt (2020) searched for known plant names on Earth and found that the success rate of fuzzy matching ranged between 94.7% and 99.9%, which was higher than the range exhibited by exact matching. Wang et al. (2020a, b) constructed the Weibo work recovery user dictionary based on frequency statistics associated with fuzzy matching the target words in the user dictionary, which exhibited a high F1 value. We believed that the reasons underlying this fact might be that in the exact matching mode, if a Weibo blog post contained a word that was completely consistent with the target word, the post would be scored as 1 (i.e., containing a physical exercise behavior), so the exact matching was based on the idea of “all or nothing”. If a word included in the post was completely consistent with the target word, it was counted; otherwise, it was not counted. However, the vocabulary included in the user dictionary developed in this research is mainly drawn from the top 300 words included in the statistics of sports-related keywords and topical blog posts on Weibo, and these high-frequency words were more general and formal. Many users used language more freely and casually and more lively and colloquial expressions in their blog posts. When researching texts on social networking platforms, e.g., Weibo, we should consider the expression habits of users in full to ensure the accuracy of textual analysis referencing Weibo (Liu et al., 2015; Kim et al., 2021).

The relationship between sports behavior and features of linguistic expression

The analysis of differences in mental vocabulary found no significant differences in the expression of positive emotional linguistic features among the two exercise groups and the nonexercise group. However, in terms of negative emotions, anxiety, sadness and anger, the eigenvalues of the physical exercise group were significantly lower than those of the nonexercise group, indicating that sports behavior could reduce the frequency of users' negative vocabulary usage. Although the size of the value for the mental vocabulary feature could not be equal to the score of mental health, the features of linguistic expression related to psychology and behavior output through the "TextMind" system could reflect the user's emotional state during the COVID-19 epidemic (Li et al., 2020; Liu et al., 2019).

With respect to behavioral vocabulary, the health expression feature values of the physical exercise group were significantly larger than those of the nonphysical exercise group and the nonexercise group, but there were no significant differences between the nonphysical exercise group and the nonexercise group. Death was opposed to health features, and the death feature values of the nonphysical exercise group were significantly lower than those of the nonexercise group, indicating that the users included in the physical exercise group seldom posted blogs pertaining to diseases, deaths and disasters during the epidemic but paid more attention to blog posts such as those pertaining to preventing the epidemic and remaining healthy. Researchers have pointed out that health motivation is an important factor affecting sports behaviors (Farholm and Rensen, 2016; Zhang et al., 2019). In the expression of leisure and work features, the work features exhibited the nonphysical exercise group were significantly higher than those exhibited by the nonexercise group. The nonphysical exercise blog contains some information related to professional athletes, such as basketball and football players' transfer-related information and sports advertisements, which may be the reason for the higher job feature value. Playing sports, watching sports and commenting on sporting events are all various kinds of leisure activity, and the leisure feature values of the two sports groups were higher than those of the nonexercise group.

In short, we found significant differences in health, death, leisure and work among users in each exercise category group. These findings could improve our understanding of the characteristics of different types of users' online vocabulary expressions as well as the concerns of different user groups in this context.

The relationship between sports behavior and mental health during the COVID-19 epidemic

Some psychological indices in the physical exercise group were better than those in the nonexercise group. Compared with people who did not exercise, people with exercise interventions or who engaged in more exercise exhibited lower levels of obsessive–compulsive disorder, depression, anxiety, stress (Duncan et al., 2020; Fortes, 2020; Chen et al., 2021), hostility, paranoia, somatization and terror (Wang et al., 2020a; Tian et al., 2020) during the COVID-19 epidemic. These results were consistent with those of previous studies.

In addition, the nonphysical exercise group was superior to the nonexercise group in terms of the indicators of stress, terror, and hostility, indicating that nonphysical exercise might also have a positive impact on mental health. This conclusion also essentially verified Hypothesis 2; physical exercise and nonphysical exercise could improve mental health. However, this article did not conduct further research concerning this topic. Few studies have been conducted in China to investigate the effects of nonphysical exercise on mental health and the mechanisms underlying the effects. We proposed that the possible reasons for this situation were as follows:

First, it was found that the more people pay attention to sporting events and other information in the media, the more active their sporting behavior is, and attention to sporting events is positively correlated with the frequency of engaging in sports behavior (Sun and Zhang, 2016; Nansa, 2019). Therefore, it is possible that nonphysical exercise behavior improves the frequency of physical exercise behavior and thus affects mental health.

Second, Gallese and Lakoff (2005) found that excitement as measured in the brain center was the same when subjects participated in actual grasping sports and when they imagined engaging in grasping sports. Psychological simulation causes the individual to return to a previous sensory experience state, activates the corresponding sensory areas of the brain, and thus influences the individual's cognition, judgment and behavior (Barsalou, 2008; Wei et al., 2018). Ye (2010) claimed that embodied simulation, e.g., in terms of metaphor, body state and action, was an important way of generating cognitions. Watching sporting events or videos and other forms of nonphysical exercise might trigger the physical exercise simulation mechanism so that individuals can experience a sensory experience state similar to that achieved while participating in sports, thus gaining pleasure and relaxation and further affecting their mental health.

Accordingly, we maintain that nonphysical exercise may not improve individuals' depression levels, but it appears to play a positive role similar to that of physical exercise in improving mental health indicators such as stress, terror, and hostility. This finding might offer new insights into ways of improving individuals' mental health. The question of how nonphysical exercise plays a positive role in this process is worth further exploration.

Limitations and future research

The Weibo exercise behavior dictionary discussed in this study contained only eight popular sports and fitness-related words. Generally, the more topics or numbers that are contained in such a dictionary, the better its ability to recognize texts; accordingly, future research can expand the content of this dictionary. In addition, this research used only blog posts by Weibo users, while microblog images and videos are also of great value. These data features have been verified several times in foreign contexts and have been found to be effective in predicting psychological characteristics such as the personality traits of social media users (Kosinski et al., 2013; You et al., 2015). More importantly, this research found that nonphysical exercise behaviors can improve certain mental health indicators in individuals, but it did not explore in depth the causes of these effects of nonphysical exercise behaviors on mental health and the mechanisms underlying these effects, which also represents a direction for future research.

Conclusion

This study investigated whether exercise behavior in microblogs could be identified by constructing a dictionary and whether physical and nonphysical exercise could improve the mental health of individuals during an epidemic. Through the construction of a Weibo user behavior dictionary, it was found that the exercise behavior classification procedure could improve our ability to identify exercise behaviors in the Weibo text. Moreover, the physical and nonphysical exercise groups significantly outperformed the nonexercise group with respect to certain mental health indicators and employed less negative emotional vocabulary. On this ground, this research provides a scientific online evaluation methodology and support for research concerning exercise and mental health during the COVID-19 epidemic.