Public health surveillance methods continue to evolve to improve the timeliness and accuracy of reporting. Improvement in these surveillance system characteristics is particularly relevant to mass gathering settings which may strain the resources of the host community, introduce new health risks, and at the same time be high profile for the community (McCloskey et al. 2014; World Health Organization 2015). The population influx associated with these events can increase the risk of communicable disease transmission due to importation of new infectious microbes and crowding and can also lead to health impacts due to the activities involved, such as environmental exposures from extended time outdoors. Toronto, Canada, was the host city to the 2015 Pan and Para Pan American Games (P/PAG) from July 10–26 to August 7–15, 2015, respectively. The P/PAG is a sporting event that draws thousands of athletes, visitors, officials, and media personnel from 41 countries (The Toronto Organizing Committee for the 2015 Pan American and Parapan American Games 2015). With a large population influx expected over an extended duration, surveillance was a key activity for public health agencies in preparedness and response to P/PAG (Ontario Agency for Health Protection and Promotion (Public Health Ontario) 2015).

Syndromic surveillance uses pre-diagnostic data, and the inclusion of syndromic data sources in public health surveillance for mass gatherings has been shown to be useful (McCloskey et al. 2014). Interest in the implementation of novel or non-routine data sources for syndromic surveillance data has largely been driven by an aim of more timely notification and tracking of health risks (Henning 2004). Syndromic surveillance ideally leads to more rapid identification of risks than is possible by traditional surveillance sources that are primarily dependent on disease reporting based on laboratory testing. The ubiquitous use of mobile phones and the Internet has further led to consideration of novel data sources for syndromic surveillance (Keller et al. 2009; Brownstein et al. 2009; Charles-Smith et al. 2015; Jia and Mohamed 2015). Social media platforms such as Twitter capture population comments and thoughts that could indicate a health issue. By extracting relevant tweets based on keywords related to a health syndrome or condition of interest, social media has been used as a data source during emergencies such as Hurricane Isaac and the Boston Marathon explosions (Keller et al. 2009; Cassa et al. 2013; Bennett et al. 2013). A recent systematic review, however, indicated that the evidence base on the use of social media in public health surveillance practice has many gaps (Charles-Smith et al. 2015). In particular, this review identifies that a specific challenge exists in translating research using social media for surveillance into practice and notes that under-representation of social media analytics in active surveillance may be due to a lack of resources or required technical skills. While there was insufficient evidence to support including social media as an official data source for P/PAG surveillance, our team sought to leverage our combined analytic expertise relevant to social media and surveillance in order to pursue further research to understand the practical role of social media in public health surveillance for mass gatherings.

This proof-of-concept study examined Twitter as a data source for public health surveillance during a mass gathering in Canada. We had two objectives. First, we aimed to explore the feasibility of acquiring, categorizing, and using Twitter data. Second, Twitter data were compared against existing data sources used for P/PAG surveillance.


An ecological time series design was used to achieve the study objectives. The study period included a pre- and post-Game interval for data collection, from June 26 to September 10, 2015. The study setting was Toronto, Canada. To achieve the second objective, four syndromes were selected based on risks assessed through a Hazard Identification Risk Assessment process carried out during the planning phase of the P/PAG: respiratory, gastrointestinal (GI), heat-related illnesses, and influenza-like illness (ILI) (Citron and Khan 2013). Definitions for the syndromes for all data sources, aligned with syndrome definitions for emergency department (ED) syndromic surveillance, were described in more detail as follows.

Development of a Twitter-based data source

To achieve the first objective, syndromes were created based on keyword categorization and used for extraction from Twitter. A list of potential keyword terms was identified for each syndrome through multiple steps. First, an initial list of keywords was generated for the syndromes based on a review of recent publications that used Twitter in public health surveillance and monitoring (Aslam et al. 2014; Denecke et al. 2013; Broniatowski et al. 2013). Second, term lists were augmented by terminology from ED chief complaint data used for syndromic surveillance. Definitions from an ED-based syndromic surveillance ontology were reviewed as relevant to the syndromes of interest for this study (BLU Lab (University of Pittsburgh), The Surveillance Lab (McGill Clinical and Health Informatics Research), NLP Research Group (National Institute of Informatics, Japan) n.d.). To account for the different language that may be used for ED chief complaint data, compared with tweets, the lexical database WordNet was employed to supplement corresponding synonyms and regular expressions expected to be used on Twitter (WordNet 2015). Third, term lists for each syndrome were refined based on a consensus meeting including research team members with expertise in syndrome surveillance approaches (DB, IJ, YK, GL, KM). When applicable, a relative emphasis on over-inclusion of potentially relevant terms was used in final term selection to ensure term list comprehensiveness. The final list of terms is provided as Supplementary Table 1.

The extraction of Twitter data was done in a two-step process. Twitter’s public streaming application programming interface (API) provides free real-time access to a 1% random sample of all tweets. This 1% of data is publicly available and free to all public health agencies wishing to analyze it. Initially, all tweets, regardless of geographic location, matching any of the keywords were collected and stored (Elasticsearch 2017). Using the term lists described previously, each tweet was classified into one or more of the four target syndromes of interest (i.e., respiratory, GI, heat-related, and ILI). Tweets could be classified into multiple syndromes; for example, “cough” appears in both respiratory and ILI syndromes and “muscle spasm” appears in both ILI and heat syndromes. In step 2, filtering was conducted based on a geographical boundary consistent with the City of Toronto. Geographic filtering occurs by using Global Positioning System (GPS) location or user profile information collected with each tweet. Specifically, tweets were identified as being from Toronto based on users enabling their GPS coordinates and being located in Toronto at the time they tweeted or identifying themselves as having a location of Toronto within their Twitter profile.

Comparator data sources

The frequencies over time of the tweet-based syndromes were compared to other data sources used to monitor the same syndromes. Twitter data was thus compared with other more routine public health surveillance data sources at an aggregate level, over the fixed time period of the P/PAG. The data sources used for comparison were selected based on their availability (i.e., existing streams of data) and demonstrated the ability to monitor the chosen syndromes during the P/PAG. These were chief complaint data from emergency department visits (Kingston, Frontenac and Lennox & Addington Public Health n.d.), toll-free telephone health helpline calls (Government of Ontario 2014), laboratory testing positivity rate, mandatory disease reporting systems, and temperature. All data were for residents of Toronto.

The comparator data sources are outlined in Table 1, which includes the syndrome for which the data source is used as a comparator, the source data type and name, the database, and other features used to narrow down comparators. These were specifically focused on either Toronto residents or for a Toronto location (as relevant to the source).

Table 1 Comparator data sources used for Toronto and relevant characteristics

Data analysis

Syndrome counts were aggregated daily during the study period for each data source for those residing or situated in the City of Toronto, as relevant. Descriptive statistics by syndrome were calculated for Twitter count data; in addition, total tweets were stratified by keywords to determine the proportion of total tweets each keyword generated.

Cross-correlation analysis was conducted for each Twitter syndrome compared with existing surveillance data sources. Cross-correlation coefficients were determined for time lags of − 5 to + 2 days and were considered statistically significantly different from 0 at the 5% level. Supplementary analysis was done to examine correlations between heat syndrome and temperature. Temporal correlation within each times series was also assessed using autocorrelation plots.

All statistical analyses were conducted using SAS version 9.3. Ethics approval was obtained from the Public Health Ontario Ethics Review Board.


The implemented approach allowed the successful extraction of tweets for the four syndromes from the 1% Twitter sample. There was continuous extraction of relevant tweets over the study window, save for one outage. From the evening of Friday, July 17th, until the morning of Monday, July 20th, a connectivity error resulted in failure to capture tweets in real-time. To address this outage, missing tweets were later extracted using the Twitter API and the time series in-filled with the missing data.

Overall, the daily frequency of tweets classified into each syndrome was low. The mean numbers of daily tweets for ILI, respiratory syndromes, GI illness, and heat syndrome were 22.0 (standard deviation (SD) 6.0), 21.6 (SD 6.0), 6.4 (SD 2.8), and 4.1 (SD 3.0), respectively.

A total of 88 cross-correlations were tested and these results are provided as Supplementary Table 2. The frequency of Twitter syndromes was correlated with the frequency of various other surveillance data sources. The ED data were correlated with the GI syndrome, the telephone helpline data were correlated with the heat syndrome, and the laboratory data were correlated with the respiratory syndrome (Table 2). The ED GI syndrome and laboratory respiratory syndrome correlations were weak and at various temporal lags, while those between the telephone helpline and heat syndrome occurred over a range of days, with the highest correlation at a zero time lag (Figs. 1 and 2). There were also negative correlations found between Twitter and the respiratory and ILI syndrome for ED data.

Table 2 Statistically significant cross-correlation coefficients with Twitter data
Fig. 1
figure 1

Daily tweet counts compared with telephone helpline calls for heat syndrome

Fig. 2
figure 2

Cross-correlation of Twitter and telephone helpline data for heat syndrome

The level of correspondence in the Twitter and telephone health helpline data is illustrated in a plot of tweets and percent of telephone health helpline calls related to the heat syndrome (Fig. 1). Figure 2 displays the trend of statistically significant correlations of Twitter and telephone health helpline calls for heat. No statistically significant correlations were observed for the ILI syndrome or between the Twitter counts and reportable disease data. Correlations between heat-related tweets and temperature data were statistically significant (r = 0.5) over a range of days, as displayed in Fig. 3.

Fig. 3
figure 3

Cross-correlation of Twitter heat syndrome and temperature


This study examined Twitter as a data source for public health surveillance during a Canadian mass gathering. The data were accessed in real-time and analyzed daily, providing a timely data source relevant to public health practice. Given the previously identified challenge in translating research using social media for public health surveillance into practice, this study represents an important contribution to applied public health research (Charles-Smith et al. 2015).

The Twitter syndromes were compared against other surveillance data sources over the P/PAG period. There were no communicable disease outbreaks during the P/PAG time period, and the only public health incidents that occurred during this period were several heat alerts (Chan et al. 2017). In this study, we demonstrate that the Twitter syndromes were sensitive to the heat alerts, as Twitter data were correlated with the telephone health helpline (Telehealth) data for heat syndrome and also correlated with maximum daily temperature. The peak correlations in the Twitter signal were on the same day as peaks in telephone helpline and temperature data. Based on this study, the addition of Twitter data to surveillance did not contribute to increased timeliness for mass gathering surveillance. This is an important finding given that the inclusion of novel sources of surveillance data aims to generally increase the timeliness of detection; however, Twitter data also includes a geospatial dimension which is unique and potentially useful for identifying the location of a public health issue. The weak GI and respiratory correlations that were found did not persist over a range of days and did not demonstrate a persistent alert as the heat correlations did. The negative cross-correlations found for respiratory and ILI syndrome, as well as the positive GI and respiratory correlations, were difficult to interpret given the lack of any detectable communicable disease events during the P/PAG. It is reassuring, however, that there were no peaks or aberrations noted in the Twitter data that could have been misinterpreted as health events. While it was not possible to assess the sensitivity or positive predictive value of Twitter for all public health risks identified for the P/PAG, the temporal correlations for the Twitter heat syndrome with heat alerts declared during the event provide support for the potential value of Twitter as a data source.

We were able to acquire, categorize, and use Twitter data that are openly accessible to the public. Using this source of Twitter data likely influenced the low counts of tweets to analyze, given we only accessed a 1% random sample of tweets for our analysis. The scope of this study did not allow the team to access, store, and analyze all tweets, referred to as the Twitter “firehose,” during the study period (Morstatter et al. 2013; Twitter 2017). Access to firehose data could be purchased but is not openly accessible; therefore, this study represents a practical use and analysis of Twitter data that would be accessible to all public health agencies. In addition, the small number may relate to the fact that tweets were identified as being from Toronto based on user preferences or their Twitter profile. The proportion of Twitter users that are GPS-enabled or have location identified in their profile is estimated to be smaller than the actual number of individuals tweeting (Burton et al. 2012).

The system and resource-based considerations are highly relevant for public health agencies involved in data collection and analysis of novel or non-routine data. In 2013, Denecke et al. published a set of recommendations for establishing systems to exploit Twitter for public health monitoring, which included implementation of machine-based learning capability which would increase specificity and can help mitigate limitations to specificity in syndromes (Denecke et al. 2013). Further, additional techniques could be considered to analyze the content of tweets beyond the keywords, to better understand context or sentiment (Mollema et al. 2015). Since increased timeliness of Twitter data was not observed in this study, the analysis of context, sentiment, or geolocation implications may represent an important area for future study. The size and capacity of local public health agencies varies and the added cost of implementing infrastructure for collecting and analyzing social media data may not be practical for all public health agencies. Centralized data infrastructure for public health surveillance might best address resource considerations, although more evidence is needed to warrant such an investment.

In this study, we found that surveillance of syndromes defined from tweets can detect an increase in heat-related illness during a mass gathering where heat alerts were the only public health event. The absence of major spikes in the Twitter syndromes for communicable diseases also supports the finding in a previous study that social media data sources can be used for reassurance during high profile events when no signals have been detected from multiple data sources and, thus, support ongoing activities for the event (McCloskey et al. 2014). In moving from our study to routine public health practice, there are additional considerations for the validation of a Twitter-based surveillance system. The syndromes we used were created based on methods that have been applied in ED syndromic surveillance (BLU Lab (University of Pittsburgh), The Surveillance Lab (McGill Clinical and Health Informatics Research), NLP Research Group (National Institute of Informatics, Japan) n.d.). It is possible that the same approach used for chief complaint-based syndromes in ED surveillance may not translate well to Twitter which is based on expressions on social media. To mitigate this, we added regular expressions using a lexical database; however, it is possible that this did not capture all of the expressions that may be used by the public on Twitter. This could also increase the likelihood of correlations with ED data. The background level of “noise” on Twitter may affect specificity of the syndrome, which is important to consider for future study. For example, some keywords can be used colloquially or in reference to non-health-related events or expressions (e.g., uses of the word “hot”) and may therefore not be valid for the syndrome of interest. As well, health-related tweets may not indicate infection or health impacts, as media organizations and health interest groups may use Twitter to raise awareness and communicate with the public. As Denecke suggests, machine-based learning can be used as a strategy to address aspects of signal to noise (Denecke et al. 2013). Future work would therefore benefit from further validation of the Twitter syndromes and consideration for machine-based learning algorithms or content analysis. In addition, further research is needed to examine the role for social media in surveillance for other mass gathering contexts, considering implications of the type of event, seasonality, and the host population.


Twitter data can be used for public health surveillance during mass gatherings. Using a simple system, based on keyword extraction representing health syndromes of interest, we demonstrated potential public health surveillance value for Twitter data. Tweets representing the main public health event which was detected by any means during the P/PAG, heat alerts, were found to be correlated with both telephone helpline and temperature datasets. There was no evidence of increased timeliness with Twitter data. Further research is needed to substantiate the role of Twitter and other social media sources in both routine and enhanced public health surveillance, including validated Twitter-based syndromes and augmented system infrastructure to include filters and machine-based learning. Resource implications for local public health agencies and potential for more centralized data infrastructure also warrant future study.