Background

Hand, foot, and mouth disease (HFMD) is an infectious disease that causes blistering rashes on the mouth, hands, and feet. Most infected individuals recover from HFMD within a few days of infection. Various comorbidities, including myocarditis, neurogenic pulmonary edema, acute flaccid paralysis, and central nervous system complications such as meningitis, cerebellar ataxia, and encephalitis, can also occur [1, 2]. HFMD is distributed worldwide, and outbreaks often occur during the summer and early fall in the United States. Large outbreaks in Cambodia, China, Japan, Korea, Malaysia, Singapore, Thailand, and Vietnam have been reported over the past 2 decades [3, 4]. HFMD is seasonal in temperate Asia with a summer peak, and subtropical Asia with spring and fall peaks, but not in tropical Asia, indicating that temperate Japan has a climatic role [5]. During the summer of 2011, Japan had the largest HFMD epidemic on record, with 347,362 reported cases [6]. Coxsackievirus A6 (CV-A6) infection is responsible for most cases, with co-circulation of coxsackievirus A16 (CV-A16) and enterovirus A71 (EV-A71) [7]. EV-A71 was sporadically detected from October 2014 onward. It became the predominant serotype in 2018, with approximately 70,000 reported cases, following an increase in its spread from the end of 2017 [8]. Since June 2019, a severe outbreak of HFMD has occurred in multiple regions of Japan, attracting public attention [9]. As enteroviruses can spread rapidly by droplet and fomite transmission among children in daycare centers and kindergartens, understanding HFMD outbreaks is vital to public health, particularly during the Coronavirus Disease 2019 (COVID-19) pandemic [10].

Rapid recognition and reporting of HFMD infection are essential, and several studies have constructed models for explaining HFMD infection [11,12,13,14,15]. Rui et al. explored epidemiological characteristics and calculated the early warning signals of HFMD using a logistic differential equation (LDE) model in seven regions of China [11]. Yu et al. forecasted the number of HFMD cases with wavelet-based hybrid models in Zhengzhou, China [12]. Zhang et al. proposed a landscape dynamic network marker (L-DNM) to detect pre-outbreak signals of HFMD in Tokyo, Hokkaido, and Osaka, Japan [13]. Gao et al. used monthly HFMD infection cases and meteorological data to construct a weather-based early warning model with a generalized additive model across China [14]. Zhao et al. used a meta-learning framework and combines Baidu search queries for real-time estimation of HFMD cases [15]. The above studies used a range of data, including monthly or weekly HFMD infectious cases [11, 12, 14], dynamic information from city networks, horizontal high-dimensional data, records of clinic visits [13], meteorological data [14], and Baidu search queries data [15] which are relatively difficult to access or delayed updates in Japan. Traditional surveillance and reporting systems lag an outbreak by 1 to 2 weeks because of the reporting and verification process. In Japan, the National Institute of Infectious Disease (NIID) has monitored the outbreak of various infectious diseases and issued weekly reports since 1999, but delayed for several weeks [16]. In addition, no studies have focused on changes in HFMD affected by the COVID-19 pandemic by Internet searching data compared with previous studies. As of August 2022, Google search data was considered reliable because its market share in Japan has been over 70% since 2009 [17, 18]. Google Trends data offers real-time and labor-saving information search by the public, which may be valuable for infection surveillance.

The science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, to inform public health and policy is defined as “infodemiology” [19]. Google Trends is frequently used in infodemiology research to gauge public interests [20]. Google Trends reflects public information-seeking behavior and allows users to analyze Google search data for specific search terms in any country or region over a selected period [21, 22]. Studies have shown that online query trends correlate with real-life epidemiologic phenomena such as flu [23], sinusitis [24], lifestyle-related disease [25], asthma [26], and pruritus [27]. Researchers have also investigated public interest and information-seeking behaviors in chronic obstructive pulmonary disease (COPD) [28], cancer [29, 30], bariatric surgery [31], kidney stone surgery [32], and suicide [33]. During the COVID-19 pandemic, similar studies using Google Trends search data were conducted to predict COVID-19 infectious cases [34, 35], explore public attitudes toward vaccination [36,37,38], identify symptoms caused by pandemics [39,40,41], and assess affected medical services [42,43,44]. The above studies indicate that Google Trends can assist in gaining a better understanding and analysis of health information-seeking behavior. Information from Google Trends can be used to supplement current infection reports with lag times.

This study aimed to explain HFMD infections using Google Trends data from Japan before and during the COVID-19 pandemic.

Methods

Data

We obtained actual HFMD cases from the weekly reports issued by NIID, which included new infectious cases and sentinel cases by prefecture and updated them from 1999 to present [16]. Additionally, we set the geographic location to Japan and the category to health to limit irrelevant results and downloaded the relative search volume (RSV) of the “HFMD” search topic using Google Trends from January 1, 2009, to December 31, 2021. The normalized RSV data represent the search interest relative to the highest point for a given region and time. The scales of normalized RSV varied from 0 to 100, where 0 meant there was insufficient data for a term, and 100 was the peak popularity. We selected a search topic instead of the search term “HFMD” for comprehensive search information, and limited the period from 2009 to 2021. In this study, we used the search topic "HFMD" and the search term "HFMD." The weekly RSV of the search topic “HFMD” for each year between 2009 and 2021 were gathered for further analysis.

To identify the significant factors of HFMD infection, we created multiple linear regression models using HFMD-related search terms selected by Japanese tweets. We retrieved Japanese tweets through the publicly available Twitter Stream application programming interface (API) by querying the keywords “HFMD” to select “HFMD” related top words. Google applied improvements to the data collection system on January 1, 2016, and January 1, 2022. For consistency, 275,010 tweets restricted to 2016 and 2021 were downloaded for this study. Tokenization was used to select the top words in Japanese tweets, which is a fundamental step in many natural language processing (NLP) methods, especially for languages such as Japanese that are written without spaces between words. We tokenized all the tweets and analyzed the unigram tokens. Website links, special characters, numbers, and “amp” (ampersands) were removed from the tweets before tokenization. Python packages SpaCy and GiNZA were used to remove Japanese stop words and implement tokenization. White space characters joined tokenized words into the text in the original order. The Python package scikit-learn was used to convert the white space-joined texts into unigram tokens and calculate the token counts. The counts of tokens are provided in Additional file 1. Search terms with counts larger than 1000 and belonging to ranges of infection sources, susceptible sites, susceptible populations, symptoms, treatment, preventive measures, and identified diseases were selected for further analysis. Selected terms with corresponding interpretations and categories are provided in Additional files 2 and 3, respectively. We downloaded the weekly RSV of selected search terms through Google Trends in two periods: before (2016–2019) and during the pandemic (2020–2021) based on the first case of COVID-19 in Japan on January 16, 2020.

Statistical analysis

First, we calculated the Pearson correlation coefficient between the HFMD cases and RSV of the search topic “HFMD” each year from 2009 to 2021 instead of the whole period due to the periodic characteristics. Because our response variable HFMD cases and explanatory variables search topic RSV were measured on a continuous scale, the parametric test was selected instead of non-parametric analysis [45].

Second, we conducted cross-correlation analyses between the HFMD cases and RSV of the selected search terms. Cross-correlation is a measure of the similarity between two series as a function of the displacement of one relative to the other, and is used to objectively estimate the time lag between cases of HFMD infection and related search terms [46]. We set the maximum lag to ± 20 weeks because of the periodic characteristics of the HFMD infection. We obtained 40 cross-correlation coefficients for each HFMD-related search term, before and during the COVID-19 pandemic. Next, we selected the coefficients with the greatest absolute values and exhibited their true values. Finally, we compared the coefficients with the highest absolute values in the periods before and during the COVID-19 pandemic. Regarding these coefficients, we assumed negative, zero, or positive values, with the greatest absolute value representing the search terms that occurred earlier, coincided with, or later than the actual HFMD cases. Differences in lags during and before the COVID-19 pandemic were calculated to determine public awareness of HFMD.

Third, we conducted multiple linear regressions to identify the most important Google search terms to explain HFMD infection before and during the COVID-19 pandemic. We included HFMD-related search terms for multiple linear regression explanatory variables with actual HFMD cases as response variables. Collinearity is the correlation between explanatory variables that express a linear relationship in a regression model. When explanatory variables are correlated in the same regression model, they cannot explain the response in a dependent manner. We normalized the RSV of each selected term to avoid collinearity in the regression models. Several common methods have been used for explanatory variable selection to identify the most significant search terms and limit the number of explanatory variables, including forward selection, backward elimination, and stepwise regression [47]. Backward elimination was used in this study to find the best subset of search terms owing to its easy implementation and automated availability. We used a p-value threshold of 0.05 to remove unnecessary search terms. In each round, we removed the search term with the highest p-value and reconstructed a multiple linear regression model until all the p-values were below the threshold.

To assess the performance of the linear regression model, the coefficients of determination R2 or adjusted R2, which indicate how much variation in response is explained by the model, are often used [48]. We selected the adjusted R2 value for model evaluation to avoid overfitting the model.

Results

Basic description of HFMD cases and “HFMD” RSV

Figure 1 presents the actual HFMD and RSV cases from 2009 to 2021. Visual inspection of the figure indicates that both the actual HFMD cases and RSV of Google Trends peaked around July in most years, except for 2020 and 2021. The number of HFMD infections surged after 2011, peaking every 2 years before 2020. RSV levels coincided with this trend. In 2020, no periodic peak in infection was observed, whereas in 2021, the peak was delayed to November.

Fig. 1
figure 1

Monthly number of cases and RSV of HFMD from 2009 to 2021

As shown in Fig. 2, we calculated the correlation between the actual HFMD cases and RSV for each year from 2009 to 2021. The mean (standard error) was 0.820 (0.052). Most coefficient values were greater than 0.7, except 0.338 in 2020. The Pearson correlation coefficients are provided in Additional file 4: Table S1.

Fig. 2
figure 2

Correlation between the number of cases and RSV of HFMD for each year from 2009 to 2021

Cross-correlation between HFMD cases and search term RSV before and during the pandemic

We performed cross-correlation analyses to determine the temporal relationship between HFMD cases and the search term RSV. Cross-correlation results before and during the pandemic are presented in Table 1.

Table 1 Cross-correlation analysis of HFMD cases and search terms RSV

Compared with period 1, the temporal correlation between HFMD cases and the search term RSV changed in period 2. In period 1, 61 (60.4%), 6 (5.9%), and 34 (33.7%) of the search terms RSV presented earlier, coincided with, and later than the HFMD cases, respectively. In period 2, 73 (72.3%), 1 (1%), and 27 (26.7%) presented earlier, coincided with, and later, respectively, than the HFMD cases. The differences in lags for 73 (72.3%) search terms were negative, which might indicate increasing public awareness of HFMD infections during the COVID-19 pandemic. In contrast, lags for the five search terms did not change, and 23 search terms exhibited delays.

Essential search terms for explaining HFMD cases before and during the pandemic

We identified the most significant search terms for HFMD infection using multiple linear regression with backward elimination. As shown in Table 2, 18 search terms were significant in period 1 and accounted for an adjusted R2 value of 96.7% of the variation in HFMD infection. Conversely, as shown in Table 3, 57 search terms were detected as significant, with adjusted R2 = 98.4%, in period 2, which included more critical variables than in period 1. The model specification formulas are provided in Additional file 4: Table S2 and Table S3.

Table 2 Multiple linear regression model of HFMD cases and RSV of selected search terms from 2016 to 2019
Table 3 Multiple linear regression of HFMD cases and RSV of selected search terms between 2020 and 2021

Compared to period 1, the significant search terms in period 2 contained the same meanings and expanded informative search content. “Herpangina,” “nasal mucus,” “exhaustion,” “diarrhea,” “summer cold,” “young child,” “pediatric,” and “swelling” occurred in the original form. “Pain,” “reduce fever,” “oral cavity,” “chickenpox,” “itch,” and “sole of the foot” occurred in morphing or synonym forms. “Virus” was replaced by more specific infection sources in period 2, such as “adenovirus,” “entero-,” “coxsackie,” “mycoplasma,” “legionella,” and “hemolytic streptococcus.” Correspondingly, “adult” was superseded by other terms for susceptible populations, such as “child” and “infant” (Omit synonyms). Search terms related to susceptible sites, preventive measures, and treatment also were identified in period 2, such as “daycare center,” “kindergarten,” “handwashing,” “disinfection,” “hospital,” and “specialty drugs.” Multiple linear regression results corroborated the cross-correlation results and indicated that public awareness of HFMD might increase during the COVID-19 pandemic.

Discussion

This study presented trends and correlations between HFMD cases and RSV of “HFMD” from 2009 to 2021. Cross-correlation analyses were conducted between HFMD cases and search terms for RSV before and during the pandemic period. Additionally, multiple linear regressions were used to identify significant search terms for explaining HFMD cases in the two periods. Our results indicated that HFMD cases and RSV peaked around July in most years, except in 2020 and 2021, and surged after 2011 with peaks every 2 years before 2020. The search topic “HFMD” exhibited strong correlations with HFMD cases except in 2020, when the COVID-19 outbreak occurred. Furthermore, cross-correlation and multiple regression results revealed that the public might have improved their awareness of HFMD infection during the pandemic. To our knowledge, this study is the first to explain HFMD cases using Google search data and to examine changes in information-seeking behavior toward HFMD affected by the COVID-19 pandemic. Google search data can supplement public health surveillance and help authorities respond rapidly to infectious diseases.

From 2009 to 2021, the RSV of “HFMD” coincided with the HFMD cases, except in 2020, which showed similar trends and peaks. In Japan, HFMD peaks generally occur in July [5]. During the COVID-19 pandemic, different from previous HFMD peaks disappeared in 2020 and lagged until November 2021. In 2020, Google Trends search data did not match the “HFMD” cases, with a relatively small peak in July. Despite a small peak in the RSV of the search topic “HFMD,” the volume was much lower than in previous years. In contrast, the Japanese government has implemented several measures to control COVID-19, which might potentially influence the spread of HFMD. Respiratory droplets and contact routes are the main infection routes in HFMD and COVID-19 [49, 50]. Therefore, the population susceptible to HFMD also stays safe by taking standard precautions during COVID-19, such as physical distancing, wearing a mask, regularly washing hands, and coughing into a bent elbow or tissue [51].

Based on the evidence provided by our results, the global pandemic might have enhanced public awareness of HFMD in addition to COVID-19. 73 (72.3%) search terms cross-correlated earlier with HFMD cases during COVID-19, and significant search terms detected in Period 2 contained more informative information. Previous studies demonstrated that the prevalence of respiratory infectious diseases reduced during the COVID-19 pandemic, such as influenza, varicella, herpes zoster, rubella, and measles [52,53,54,55,56,57,58]. This might have been due to adherence to non-pharmaceutical interventions and lower non-polio enterovirus activity during the COVID-19 pandemic compared to 2014–2019 [59]. Switzerland had an unprecedented complete absence of pediatric enteroviral meningitis in 2020 [60]. In Japan, community-acquired pneumonia [61] and influenza [62] admissions have reduced during the COVID-19 pandemic. COVID-19 preventative actions and better personal hygiene are beneficial for preventing the spread of the disease. However, the prevalence of common diseases may rise as the public gradually complies less with infection control measures in the upcoming season [61]. Consistent with our results, a peak in HFMD infection and public interest reoccurred in November 2021. Continuous monitoring of HFMD is required, and public information-seeking behaviors may be helpful in public health surveillance.

Google search data was applied in our study to explain HFMD cases affected by COVID-19 instead of Baidu search data, which was used for real-time estimation of HFMD cases in China [15] and other categories of data for HFMD prediction or explanation [11,12,13,14]. In contrast to previous studies, we focused on the distinction between the information-seeking behavior of HFMD before and during COVID-19 and attempted to explain the HFMD cases using Google Trends data, which has the highest market share in Japan [17]. Many researchers have shown that Google search data represent public interest in a specific topic. However, Google search data should be used cautiously as a surveillance system, because large events can easily interfere with it. Combining fine-grained data, such as mobility data, could help develop surveillance systems that can effectively exclude biased or irrelevant information to respond rapidly [63].

Conclusion

This study described trends and correlations in HFMD cases with RSV of “HFMD” and identified significant search terms to explain HFMD infections before and during the COVID-19 pandemic. We found that the prevalence of HFMD was abnormal during COVID-19, and the public might have enhanced awareness of HFMD infection affected by the pandemic. It is critical to continuously monitor resurgent common infections, as the public gradually reduces compliance with infection control measures. Public information seeking behavior using Google search data may be useful for public health surveillance.

Limitations

This study has several limitations. First, our findings are limited to those who used Google to search for health-related information, which may not represent the entire community. The results may be biased toward younger people, who are more digitally connected than older individuals, although the Google search engine market share in Japan is nearly 80% [17]. Second, Google Trends improved the geographical assignment and data collection systems in 2011, 2016, and 2022. Our results in the basic description of HFMD cases and “HFMD” RSV might be affected by them. Third, search data analysis is hypersensitive to large events; therefore, it is complementary instead of replacing traditional research methods. Fourth, the specific HFMD-related terms selected by Twitter may not represent all search terms in public use, especially hiragana, katakana, kanji, and alphabets used in Japan. Fifth, we used search data from 2016 to 2019 to represent the period before the COVID-19 pandemic owing to restrictions of Google Trends, which may not represent the entire period. Finally, during the process of backward elimination for regression model construction, significant terms may be eliminated because they are jointly insignificant. Although we can find potentially significant terms from all eliminated terms, it is not practical to conduct F-tests multiple times because multiple hypothesis testing leads to lower confidence levels. Hence, the current processing results were retained.