Google Trends
Google Trends (Google Inc. [14]) is a dynamic and publicly accessible, internet-based search term analysis tool that allows assessment of search query frequency of terms and keywords queried on Google Web Search, which represents the most popular search engine worldwide [13]. Search queries can be specified by geographical location (e.g. country or state), timeframe (real time or non-real time/customized timeframe, dating back to January 2004), category (e.g. health or sports), and content (e.g. web search or news search).
GT displays the search query frequency as relative search volume (RSV), which is a normalized value to allow comparability between different terms (up to five). The process is described in detail elsewhere [24]. Briefly, GT calculates the number of searches done for a particular term divided by all searches done using Google search engine adjusted to location and time, to avoid a possible bias of total search volume, as otherwise countries with the highest total search volume would also position highest. In a next step, these numbers are rescaled and displayed as whole numbers ranging from 0 to 100 (relative search volume, RSV), relating to the proportion of a specific topic compared to all search queries on all topics.
Further, GT provides the feature "related queries", which displays similar search terms entered by other users during searches for the specified one. Depending on timeframe specification, results are then displayed either in the “Top” (meaning the most popular search queries) or “Rising” (queries with biggest increase in frequency since the last time period) category. Additionally, GT also excludes duplicate searches from the same user within a short timescale.
As already noticed by other authors [25], search queries on GT using fixed filters, done on different days (and even different minutes) lead to (slightly) different results during each new search query, because (apparently) a different “random sample” from all search queries done on Google Web search is selected each time. Interestingly, data queried on different timepoints from countries with large population sizes proved a rather high reliability, whereas data from smaller countries were much less reliable [25].
Search strategy
Geographical location
Similar to previous works on seasonal variations of health-related issues [16,17,18,19, 26, 27], the authors specified English and non-English-speaking countries from both hemispheres as geographical location for further analysis: Australia, Canada, Germany, Italy, New Zealand, Norway, the United Kingdom (UK), and the United States of America (USA). The first step included the evaluation of country-specific, epistaxis related search terms, since we included countries with different official languages. Therefore, results from team brainstorming by all authors and search terms from previous studies on GT and epistaxis [12, 23] were compiled together and noted as primary search terms (Table 1).
Table 1 Primary search terms (previously published and brainstorming) and the respective search terms with the highest relative search volume, which were subsequently selected for further analyses Timeframe
All GT queries were specified for the timeframe between January 1, 2004 and September 30, 2019, to cover the longest timeframe of GT-records available. Furthermore, the “health” category was selected to evaluate health specific interest. In accordance to previously published studies, winter months were defined as December, January, February and March for the northern hemisphere, whereas June, July, August and September were assigned to the southern hemisphere [15,16,17, 28]. The present study followed the checklist for use of GT in health care research by Nuti et al. [21].
Country-specific epistaxis-related search terms
GT was assessed on October 20, 2019 and the country-specific, most representative (most relevant) search terms were selected using a systematic, stepwise approach [25]. Firstly, GT function “related queries” was used to add further, epistaxis-related search terms to the primary search terms (Supplement Table 1). Secondly, all previously collected search terms were then compared with each other using GT comparison function. As GT only allows the comparison of 5 search terms simultaneously, the term with the highest RSV always remained unchanged and other search terms were replaced gradually. This represented a crucial step, as we wanted to conduct further analysis with the country-specific highest epistaxis-related search term to avoid Type I related errors as a result of non-representative or insufficient RSV.
Validity and reproducibility of GT data
The second step included the assessment of validity and reproducibility of GT data queried at different time points, using the country-specific, most relevant search terms. Therefore, daily inquiries for monthly data were made from October 20, 2019 until October 27, 2019. Comma-separated value files (CSV) were downloaded on above-mentioned eight consecutive days, resulting in 189 data points for each country and day (15 years × 12 months + 9 months). Daily data represented single/individual time series data, whereas averaged time series data of individual countries were calculated as the mean of data queried on above mentioned 8 days.
Population data
To assess a possible correlation between population size and reliability of time series data, number of inhabitants (in million [29]) from included countries were also noted on October 27, 2019: Australia 25.31; Canada 37.53; Germany 83.62; Italy 60.54; New Zealand 4.80; Norway 5.39; United Kingdom 67.66; United States of America 329.74.
Climate data
Climate data reported herein were retrieved from the Climate Data Center (CDC) of the German Weather Service (Deutscher Wetterdienst, DWD). It is a public institution and an agency of the German Federal Ministry of Transport and Digital Infrastructure (BMVI), which provides publicly accessible, web-based data. Meteorological data used during this study included monthly summaries, recorded by meteorological stations located in the cities of Berlin (Berlin-Tegel) and Munich (Munich-Airport), which represent the two most densely populated cities in Germany. We retrieved following parameters (monthly mean): temperature—air temperature 2 m above ground (°C), sunshine duration (hours), humidity—relative humidity at 2 m above ground (%), air pressure—station level (hPa), and vapor pressure (hPa). All climate data were assessed for the timeframe between January 1, 2004 and September 30, 2019, matching that of the GT-searches.
Statistical analysis
Cosinor analysis was used to assess seasonal variations in RSV for above-mentioned time series and integrated into two steps: (a) data visualization and (b) model fitting. The operating system and methods are presented in detail elsewhere [30]. In short, cosinor analysis is a parametric seasonal model that fits a sine wave to a predefined timeframe as part of the generalized linear model and computes following variables: amplitude (size) A and phase (peak) p, based on length (duration) c defined as 12 (annual seasonal cycle = 12 months). Therefore, the peak is defined once a year and the low point (nadir) L is calculated as peak ± 6 (months). Since both sine and cosine functions characterize the seasonal variation within the cosinor model, statistical significance can be tested as part of the generalized linear model with p value set at 0.025 to adjust for multiple comparisons. Subsequently, to quantify reliability of individual and averaged time series data, intraclass correlation coefficients using the two-way-random model were performed according to Shrout and Fleiss [31]. Bivariate correlation was calculated using Spearman rank correlation (rs). Data analysis and visualization were carried out using the “season” and “psych” package in R 3.5.1 (R Development Core Team, 2008; R Foundation for Statistical Computing, Vienna, Austria) and Graph Pad Prism 8.2 (GraphPad Software, Inc., La Jolla, CA).