1 Introduction

Social media is an open source, real-time outlet for sharing with others thoughts, actions or feelings. Researchers have taken advantage of this data source as a complement to traditional disease surveillance in effort to increase the speed of detecting disease outbreaks or other health issues within populations [1]. In addition to using social media for actionable surveillance, the way the information is expressed (i.e., through specific language characteristics) can provide insight into how illness affects individuals emotionally and how they are coping with disease.

Influenza-like illness (ILI) is one of the most widely studied syndromes using social media, most likely because of its seasonal yearly occurrence in a high proportion of populations with easily identifiable symptoms. In addition, ILI is highly contagious (airborne) and can present in a range of severity based on the causal agent, e.g., non-debilitating common cold to a deadly influenza strain. The potential implications of this disease on human health and health care systems are drastic. Normal seasonal influenza outbreaks alone cause up to 500,000 deaths annually worldwide and 50,000 deaths a year in the U.S. [2]. The Centers for Disease Control and Prevention (CDC) has five different types of influenza surveillance, i.e., outpatient illness, virologic, mortality, hospitalization, and geographic spread surveillance systems, and provide official summary reports 1 to 2 weeks after patients are seen. This lag in notification is due to the required chains of data collection, verification, and reporting when dealing with human health data. The time lag is overcome with the use of open source social media data where people ‘report’ their own health conditions yet the validity and credibility of the data is minimized. Regardless, many studies have shown an increase in time to ILI outbreak detection using available online data signals, such as search query logs (e.g., Google Flu Trends [35] and Yahoo [6]), health related web page views (e.g., Wikipedia [7, 8]), self reported illness (e.g., crowdsourced reports (Flu Near You) [9, 10] and social media (Twitter) [1116]), and combined data sources [17].

To date, studies using social media for ILI surveillance, including those mentioned above, have relied on health-related text from the user, e.g., self-reporting of symptoms, syndrome or actual illness, in their analyses. However, social media analytics can provide insight into the user’s affects (defined as emotions and sentiments) based on the tweet itself. Many studies in socio- and psycholinguistics show that a person’s affects influence their health status, namely positive and negative affects correlating with good and bad health, respectively [1821]. Therefore, there is the potential that affects expressed in social media may be indirectly related to the health status of the user. Limited work has been done in this area, namely in effort to predict emotions and mental illness for civilian populations [22, 23] and correlating CDC-estimated vaccination rates and vaccination sentiments identified through social media [24].

In this paper, we focus on affect signals originating from Twitter social media surrounding military bases and compare them to the number of ILI visits recorded at military health facilities in the same area. Military communities present a unique case study because they are semi-closed populations of people who share common location, responsibilities, and way of life. They actively use social media to stay connected with their unit at home or with friends and family when deployed [25]. In addition, ILI has the ability to quickly spread throughout the military population resulting in disruptions in military operations and concern for national security [26].

To the best of our knowledge, this is the first work that studies the relationships between affects expressed via social media and military ILI data. This study is also the largest published study of the effect of influenza on military communities. Past studies targeting military population affects are limited to small-scale, designed experiments (e.g., investigating emotions in military veterans via questionnaires [27], measuring emotions in military speech [28], studying the relationship between occupational stress and mental illness in the military [29], and evaluating the health-related quality of life of United States (U.S.) military personnel [30, 31]). The largest study on mental-health risk among the U.S. military, done jointly by the U.S. Army and the U.S. National Institute of Mental Health, reports the rate of major depression is five times higher among military as civilians, intermittent explosive disorder is six times higher, and Post-Traumatic Stress Disorder (PTSD) is nearly 15 times higher [32]. By tracking emotion and sentiment changes over time across locations expressed by military personnel and their families via social media may provide a deeper insight and faster response time to changes in mental health for these communities.

Unlike the existing approaches on predicting the dynamics of influenza outbreaks from social media, we focus on studying spatiotemporal variations in emotions and sentiments expressed by military populations and correlate these affect signals with clinical diagnosis data collected from 31 global military areas. Our main contributions include (1) identifying how emotions and sentiments expressed on Twitter differ between military and civilian populations over time and space, (2) identifying differences in affects expressed during high and low ILI seasons for military and control populations, (3) qualitatively assessing and quantitatively estimating correlations between ILI clinical visit data and affects expressed by military populations in social media over time, and (4) investigating whether affects identified from user tweets lead or lag ILI clinical visit data, and thus, can be used to build ILI predictive models.

2 Data

In this section, we present ILI-related clinical visit data for military personnel and their families across 31 geolocations. We describe data collection, sampling, and annotation procedures of social media from both military and civilian populations.

2.1 ILI-related clinical visit data

The ILI clinical data consists of the number of visits to a Defense Medical Information System (DMIS) Identifier (ID) location for symptoms identified as influenza-like illness (ILI) based on the International Statistical Classification of Disease and Health Related Problems (ICD) codes documented in the electronic patient record (Table 1). The DMIS ID facility types identified as reporting ILI symptoms in patients are hospitals, clinics, administration, and dental offices. The patients who visit these facilities include active duty, reserve, and retired members along with their dependents and cadets, recruits and applicants for active duty from the army, navy, marine corps, coast guard, air force, National Oceanic and Atmospheric Administration, and other public health service. This military health data was collected from 31 specific locations (25 U.S. and 6 international). Each location comprised all DMSID IDs within a 25-mile radius around military bases (mean 7 IDs, range 2-19 IDs). The total number of health-related visits to these facilities between 2011 and 2014 was summarized (Table 2) and the percent of ILI visits to total visits was used in the subsequent analyses (mean 3.6%, range 1.1-9.5%).

Table 1 The ICD-9 codes used to describe ILI symptoms and their descriptions
Table 2 The geolocation’s facility types, mean ILI-visits, and mean all-visits for 2011-2014

2.2 Social media data

We used social media data to study military populations across different geolocations in the U.S. and internationally. The Twitter data, acquired from a social media vendor and through the public Application Program Interface, was anonymized for usernames, user IDs, and tweet IDs based on a rigorous procedure, i.e., state-of-art encryption algorithm. To ensure privacy of all users in our sampled datasets, our analysis was based only on completely anonymized data and our findings are reported on an aggregate rather than individual level. This study was approved by our institutional review board.

2.2.1 Global Military Twitter Dataset (aka Global Military Dataset)

Geo-tagged social media data has been previously used to identify specific populations and demographic groups in social networks, e.g., geographical communities and urban areas [33], users with specific demographics [34] and personality [35], and new mothers [36].

To study communications generated by military populations in social media, we collected geo-tagged tweets within a 25-mile radius of 25 military locations in the U.S. (l) and six international locations (i) following standard practices on extracting geolocation coordinates from user meta-data [23]. The tweets were collected from over 167 weeks between November 2011 through December 2014. Our large Twitter sample includes 171 million tweets produced within a 25-mile radius across 31 military locations. We report tweet distribution for each military location, i.e. point of interest, in Figure 1.

Figure 1
figure 1

Twitter dataset statistics across 31 military locations. The number of tweets collected within a 25-mile radius of military installations for 31 geolocations.

2.2.2 U.S. military and non-military Twitter timeline dataset (aka Comparison Dataset)

To identify if differences exist between affects expressed in tweets from military and non-military (i.e., control) populations, we identified Twitter users from each population. Two military and one control user sample datasets were identified from one state in the West Coast (\(p_{3}\), \(p_{10}\)), South East (\(p_{22}\), \(p_{23}\)), and South Central U.S. (\(p_{12}\), \(p_{15}\)). The most recent methods in the literature that identify specific users in social media rely on searching certain keywords in user profile metadata [23, 3739]. Our method for sampling sub-populations of users combined geolocation with the more common keywords extraction from user biography fields. The military users were identified from tweets within a 25-mile radius of military facilities with high military to civilian population ratios and from military-specific keywords, e.g., military, corporal, army brat, etc., present in their user profile data. Conversely, the control population users were sampled from a geolocation at least 50 miles away from any military facility in the same state and did not contain any military-specific keywords from our lexicon in their profile. Note: there is a possibility that our control sample includes military users if they never stated their membership in their profile descriptions and resided more than 50 miles away from any military facility at the point of user classification. Once the users were identified as control or military, user timelines containing the most recent 3,200 tweets were collected, regardless of the user’s location. The resulting dataset of military and control timelines spans over 313 weeks between Jan 2009 and Dec 2014 and contains eight million tweets from the six initial user identification sites (Table 3). A more detailed explanation on sampling and annotation processes of military vs. control populations is discussed by [40].

Table 3 The distribution of military \(\pmb{T_{M}}\) , control \(\pmb{T_{C}}\) , and total T tweets per Twitter point of interest

3 Methods

3.1 Sentiment and emotion classification models

To predict sentiments and emotions arising from tweets, we used machine learning and natural language processing techniques to build supervised classification models [34] extending recently developed approaches for affect prediction in social media [4145]. We relied on all tweets produced by military and control populations in specific geolocations to go beyond influenza-related keywords and tweets previously used to predict ILI [15], and capture other linguistic predictors e.g., discourse about the weather, personal well-being, travel, indoor and outdoor activities etc. We trained sentiment models on tweets annotated with three opinion classes downloaded from seven publicly available sentiment datasets [46]. The training data for sentiment classification includes \(T_{S}=19{,}555\) tweets in total (35% positive, 30% negative, and 35% neutral) from multiple domains e.g., health, debates, politics etc. To evaluate the predictive power of our sentiment model on tweets from the general domain, we tested our model on the official SemEval-2013 test set [47] of 3,223 tweets and report the F1 = 0.66 for 3-way classification. We found that our sentiment model is comparable with the state-of-the-art systems for sentiment classification on Twitter [41, 42].

We trained our emotion models on tweets annotated with basic emotion hashtags that correspond to six Ekman’s emotion classes - joy, sadness, disgust, surprise, anger, and fear [34, 43]. Despite the limitation that existing approaches do not disambiguate sarcastic hashtags e.g., It’s Monday #joy vs. It’s Friday #joy, they still demonstrate that a hashtag is a reasonable representation of real feelings [48]. Similar to Gonzalez-Ibanez (2011), we collected tweets with the hashtags at the end of the tweet, excluded non-English tweets and tweets with less than three tokens [48]. Moreover, we extended our emotion-hashtag dataset with emotion synonyms collected from WordNet-Affect [49] and Roget’s Thesaurus [50] to lower the rate of false positives that might be present in the data due to the sarcasm factor. Overall, we collected \(T_{E}=52{,}925\) tweets annotated with anger (9.4%), joy (29.3%), fear (17.1%), sadness (7.9%), disgust (24.5%), and surprise (15.6%).

For emotion and sentiment classification, we trained tweet-level log-linear models with L2 regularization (aka logistic regression) \(\Phi_{s}\) (as defined in Eq. (1)) and \(\Phi_{e}\) (as defined in Eq. (2)) using scikit-learn toolkit [51] that rely on lexical ngram features extracted from tweets \(t_{k} \in T\) annotated with three opinion \(s_{i} \in s\) and six emotion \(e_{j} \in e\) categories. In addition to lexical features, we extracted a set of syntactic and stylistic markers from tweets including emoticons, elongated words, capitalization, repeated punctuation, and number of hashtags and took into account clause-level negation [34].

$$\begin{aligned}& \Phi_{s} = \operatorname{argmax}_{s_{i}} \operatorname{Pr}( S = s_{i} \mid t), \end{aligned}$$
(1)
$$\begin{aligned}& \Phi_{e} = \operatorname{argmax}_{e_{j}} \operatorname{Pr}( E = e_{j} \mid t). \end{aligned}$$
(2)

We evaluated our emotion model prediction quality using 10-fold cross validation on our emotion dataset of \(T_{E}=52{,}925\) tweets and report weighted F1 = 0.78 for 6-way classification. We found that our emotion model significantly outperforms the existing approaches for emotion classification [43, 5254].

3.2 Evaluation metrics

3.2.1 ILI and affect proportions

To study the relationships between clinical ILI visit data and affects derived from tweets, we define ILI proportions per location \(l \in L\) with each location aggregated over multiple DMISIDs and per week \(w \in W\).

Weekly location-specific ILI visit proportions:

$$ I_{w, l} = \frac{\# \mbox{ of weekly ILI visits per location }l}{\mbox{weekly total visits per location }l}. $$
(3)

After applying our emotion and sentiment classification models, every tweet in our dataset was annotated with its predicted affects. We aggregated affect annotations over time and geolocations to obtain weekly location-specific sentiment and emotion proportions as shown below.

Weekly location-specific sentiment proportions over three sentiment classes \(\{\textit{positive}, \textit{negative}, \textit{neutral}\} \in s,\sum_{i} s_{i} = 1\):

$$ S_{w, l}^{s_{i}} = \frac{\# \mbox{ of weekly tweets labeled with sentiment }s_{i}\mbox{ per location }l}{\# \mbox{ of weekly tweets per location }l}. $$
(4)

Weekly location-specific emotion proportions over six Ekman’s emotion classes \(\{\textit{joy}, \textit{sadness}, \textit{fear}, \textit{disgust}, \textit{surprise}, \textit{anger}\} \in e,\sum_{j} e_{j}= 1\):

$$ E_{w, l}^{e_{j}}= \frac{\# \mbox{ of weekly tweets labeled with emotion }e_{j}\mbox{ per location }l}{\# \mbox{ of weekly tweets per location }l}. $$
(5)

We aggregated weekly clinical visit data \(I_{w, l}\), sentiment \(S_{w, l}^{s_{i}}\), and emotion \(E_{w, l}^{e_{j}}\) proportions by geolocation to construct the ILI and affect proportion time series. We then applied correlation [55] and cross-correlation analysis [56] over ILI and affect location-specific time-series to study (1) what emotions and sentiments have positive vs. negative correlations with clinical ILI visit data, and (2) whether affects are predictive of clinical ILI visit data, respectively.

To study the relationships between ILI and affects expressed in social media, we used Pearson correlation to measure of the degree of linear dependence between two variables e.g., location-specific sentiment proportions \(S_{w,l}^{s_{i}}\) (simplified as S below) and ILI clinical visit proportions \(I_{w,l}\) (simplified as I below):

$$ \rho_{S, I} = \frac{\operatorname{cov}(I, S)}{\sigma_{S} \sigma_{I}} = \frac{ \mathbf{E}[(S - \mu_{S})(I - \mu_{I})]}{\sigma_{S} \sigma_{I}}, $$
(6)

where \(\mu_{I}\) and \(\sigma_{I}\) are the mean and standard deviation for I; and similarly for S; E indicates the expected value.

To study the predictive power of emotions and sentiments originating from tweets produced by military populations in social media, we used cross-correlation analysis to measure similarity between two series e.g., location-specific emotion proportions \(E_{w,l}^{e_{j}}\) (shown as E) and clinical ILI visit data \(I_{w,l}\) (shown as I) as a function of the lag of one relative to the other:

$$ \rho_{I, E}(w) = \frac{\mathbf{E}[(I_{w} - \mu_{I})(E_{w+1} - \mu_{E})]}{ \sigma_{I} \sigma_{E}}, $$
(7)

where \(\mu_{I}\) and \(\sigma_{I}\) are the mean and standard deviation of the process (\(I_{w}\)); and similarly for (\(E_{w}\)); E indicates the expected value.

3.2.2 U.S. military and non-military affect time-series

Similar to Eq. (4) and (5), we constructed military-specific vs. non-military-specific ILI and affect time-series from the 8 million tweets annotated in the Comparison dataset. We then performed a Mann-Whitney U test [57] to investigate whether emotions and sentiments expressed by military vs. control populations during the same time frame are statistically different.

Moreover, to further study the affect differences between military and control populations, we estimated emotion and sentiment military-to-control ratios for every sentiment \(s_{i}\) and emotion \(e_{j}\) class over time across six points of interests. The military-to-control emotion ratio and sentiment ratio are defined similarly. As an example, we present the emotion ratio:

$$ R (w, p, e_{j}) = \frac{\# \mbox{ of weekly military tweets labeled with emotion }e_{j}\mbox{ per point }p}{\# \mbox{ of weekly control tweets labeled with emotion }e_{j}\mbox{ per point }p}. $$
(8)

3.2.3 Emotion and sentiment analysis in low vs. high ILI season

To further evaluate the emotions and sentiments expressed in social media during high and low ILI periods, we identified the top 25 weeks for location-specific highest and lowest ILI visit proportion numbers \(I_{w,l}\). Then, we extracted the corresponding emotion and sentiment proportions over the same period \(E_{w,l}\) and \(S_{w,l}\). We observe that over a period between 2011 and 2014 the highest ILI proportions were reported during winter months, as expected for temperate regions [58].

To estimate whether affects expressed in social media are statistically significantly different between high and low ILI periods, we apply a Mann-Whitney U test to (a) 6 military vs. control affect distributions and (b) 31 location-specific affect distributions.

4 Results

We begin this section with our novel findings on sentiment and emotion differences in social media data between military and non-military populations (i.e, the Comparison Dataset). First, we present the general differences identified in sentiments and emotions of the two populations. Second, we discuss the differences between the populations during weeks with high and low ILI visits for a subset of 25 weeks. Next, we present a detailed analysis of the Global Military Dataset. We first focus on the relationships between Twitter point-specific affect and ILI clinical visit time-series evaluated using correlation [55] and cross-correlation analysis [56]. Then, we discuss the affect differences within the Global Military Dataset between 25 weeks with high and 25 weeks with low ILI visits. Finally, we evaluate the predictive power of affects for nowcasting ILI dynamics using the state-of-the-art machine learning models for 31 locations in the Global Military Dataset.

4.1 The variations in affects between military and non-military populations

4.1.1 Differences in emotions and sentiments expressed in the comparison dataset

We observed that emotions and sentiments identified in tweets vary significantly over time for military and control populations (i.e., Comparison Dataset). Figure 2 presents military-to-control sentiment and emotion ratios (defined in Eq. (8)) over time for an example Twitter set of interest \(p_{12}\). Ratios \(R (w, p_{12},e _{j}) > 1\), show when military population expresses more of a certain affect vs. \(R (w, p_{12},e_{j}) \leq 1\), which show when control population expresses the same or more of a specific emotion or sentiment. Overall, the estimated military-to-control ratios over time from the six Twitter sets containing military and control tweet annotations (as shown in Table 3) can be summarized as follows. On average, over 313 weeks (from 11/2009 to 12/2014) across six Twitter points, the military population expresses:

  • More negative (76% of all weeks military expressed more negative opinions compared to the control) and less positive (30% of all weeks military expressed more positive opinions compared to the control) sentiments compared to the control population.

  • More anger (62%) and disgust (60%) than the control population, but less joy (36%).

Figure 2
figure 2

Emotion and sentiment military-to-control ratios over time. Sentiment ratios are estimated for three opinion types: positive, negative, and neutral and emotion ratios are estimated for six Ekman’s emotions: joy, sadness, fear, surprise, anger, and disgust over the period from 01/2009 to 12/2014 for an example Twitter set \(p_{12}\). The set of military and control users is the same over time.

High variance (more spikes) in military-to-control ratios was observed between 2009 and 2011 across all affects in Figure 2. This time period represents the lowest number of tweets produced from military and control populations as shown in Figure 3. Here, a small change in affects extrapolated from tweets can have a large affect on the ratio calculated. Thus, our emotion and sentiment proportion estimates are less confident during these years compared to between 2011 and 2014. Another reason for the high variance seen is that the timeline of tweets collected from a user most likely contain tweets originating from various locations. This highlights the role of location as an important cofactor in affects expressed by military and control populations.

Figure 3
figure 3

Military and control tweets per from 01/2009 to 12/2014 across six geolocations. Weekly number of tweets vary across locations and over time e.g., between 2009-2011 the number of weekly tweets is significantly lower than between 2011-2014.

Nevertheless, our findings on military personnel expressing more negative emotions and sentiments compared to control population across several regions concurs with the recent report on well-being and mental-health risk among the U.S. military. The large-scale study found that the risk of depression, intermittent explosive disorder, and PTSD is much higher in military than civilian populations in the U.S. [32].

In Figure 2 we demonstrated that military and control populations express different sentiments and emotions in their communications in six separate datasets over time. In order to find whether these differences are statistically significant, we performed a Mann-Whitney U test (Table 4). We found that for the majority of affect types mean differences are statistically significant except for:

  • fear (\(p_{12}\), \(p_{23}\), \(p_{15}\), \(p_{22}\), \(p_{10}\)) and sadness (\(p_{3}\), \(p_{23}\), \(p_{22}\), \(p_{10}\)),

  • surprise (\(p_{12}\), \(p_{3}\)), anger (\(p_{12}\), \(p_{22}\)), and disgust (\(p_{12}\), \(p_{15}\)),

  • joy and neutral sentiment (\(p_{23}\)).

Table 4 Differences in emotion and sentiment proportions between military (m-) and control (c-) populations

Overall, we observed that military populations express significantly different emotions and opinions in social media compared to civilian populations at the same time except for two emotions - sadness and fear. According to the Mann Whitney U Test, differences in sadness and fear over time across the majority of areas on interest for military and control populations are not statistically significant.

To provide more insight into our findings on tweet-based aggregate analysis of affects over time, we perform user-based analysis that ignores the temporal component. For that, we aggregate affects expressed by each user between 2011-2014 and contrast the means for each effect type across military and control users. Our results in Table 5 demonstrate that similar to aggregated tweet-based analysis, military and control users significantly differ in the way they express emotions and sentiments in their tweets (except for sadness emotion and positive sentiment). For the majority of this study, we focus on the population tweet-level analysis of affects over time to understand how they relate to ILI incidence.

Table 5 Differences in emotion and sentiment proportions expressed by military and control users between 2011-2014 aggregated over 6 points of interest

4.1.2 Affect differences in high and low ILI seasons between military and non-military populations

Figure 4 presents mean differences for emotions and opinions extracted from military and control populations during high vs. low ILI seasons. We only report statistically significant differences. Moreover, since we perform multiple statistical tests, some will have p-values ≤ 0.05 purely by chance, even if all null hypotheses are really true. Thus, to control the false discovery rate we use Benjamini-Hochberg procedure [59]. Our key significant observations are outlined below.

  • Military population, regardless if ILI was high or low, expressed more anger and disgust emotions, and negative sentiment than control population.

  • Control population expresses more joy emotion and positive sentiment compared to military population.

  • We have not observed statistically significant differences between military and control populations for fear and surprise emotions in low ILI periods, and neutral sentiment during high ILI periods.

  • Both military and control populations express more anger (\(p _{10}\), \(p_{3}\)) and negative (\(p_{3}\)) sentiment during low ILI season; more joy (\(p_{3}\)) and positive (\(p_{10}\)) sentiment during high ILI period; military population expresses more disgust during low ILI period (\(p_{3}\)).

Figure 4
figure 4

Mean affects during high vs. low ILI seasons across six Twitter sets. We only report differences that are statistically significant from the Mann-Whitney U test (p-value ≤ 0.05); To control false positive discovery rate we apply Benjamini-Hochberg correction: * represent Benjamini-Hochberg critical value with a false discovery rate of 0.2; and the rest of points - critical value with a false discovery rate of 0.1.

We present a detailed analysis of inter-affect correlations for six points of interests on the West Coast (\(p_{3}\), \(p_{10}\)), South East (\(p_{22}\), \(p_{23}\)), or South Central U.S. (\(p_{12}\), \(p_{15}\)) in Figures 9-14 in Additional file 1.

4.2 The relationships between ILI and affects across global military populations

4.2.1 Correlations between ILI and affect time-series across military locations

In the previous section, we showed that ILI-Affect correlations are different with some general trends observed. To investigate these differences, we present ILI-Affect correlations across 31 military geolocations from our Global Military Dataset (Figure 5). We performed correlation and cross-correlation analysis by subsampling ILI and affect data during influenza seasons (September through May) for several years between November 2011 and December 2014.

Figure 5
figure 5

ILI-Affect correlations across 31 military locations. Positive (blue) and negative (red) correlations between ILI visits and tweet affect time series by location. Locations (columns) and affects (rows) grouped by similarity using the Euclidean distance measure are shown as dendrograms.

Figure 5 demonstrates that across the majority of geolocations ILI positively correlates with sadness (except \(i_{20}\), \(i_{25}\), \(l_{19}\) and \(l_{32}\)) and neutral sentiment (except \(i_{3}\)), and negatively correlates with disgust (except \(i_{3}\)), fear (except \(l_{34}\), \(i_{3}\), \(l_{31}\), \(l_{19}\)), and surprise (except \(i_{3}\), \(l_{13}\), \(l_{31}\)) emotions, and positive sentiment (except \(l_{34}\), \(l_{13}\), \(l_{33}\)). We found that the direction of ILI-Affect correlations vary by locations for joy (9 positive and 15 negative), anger (11 positive and 17 negative), negative (13 positive and 14 negative).

Moreover, the dendrogram that groups military locations (columns) and affects (rows) by similarity using the Euclidean distance measure shows that:

  • Sadness and neutral sentiment positively correlate with ILI visits.

  • Disgust, fear, surprise emotions and positive sentiment negatively correlate with ILI visits.

The strongest identified affect time series correlations to ILI visits are:

  • Positive: joy (\(i_{25}\)), negative sentiment (\(l _{32}\), \(i_{17}\), \(i_{3}\)), neutral sentiment (\(l_{27}\), \(l_{29}\), \(l_{34}\), \(i_{20}\)), sadness (\(i_{3}\), \(l_{31}\)), anger (\(l_{31}\)), disgust (\(i_{3}\)), fear (\(l _{34}\), \(l_{19}\)), surprise (\(l_{21}\), \(l_{13}\)), and positive sentiment (\(l_{33}\), \(l_{13}\)).

  • Negative: joy (\(l_{34}\)), negative sentiment (\(i _{20}\)), neutral (\(i_{17}\)), sadness (\(i_{25}\), \(i_{20}\)), anger (\(l_{3}\), \(l_{4}\)), disgust (\(l _{37}\), \(l_{19}\)), fear (\(i_{25}\), \(l_{0}\), \(l_{10}\)), surprise (\(i_{25}\)), and positive sentiment (\(i_{3}\), \(l_{31}\), \(l_{15}\), \(l_{32}\)).

4.2.2 Cross-correlations between ILI and affect time-series across military locations

Previous research has shown that the discourse in social media is predictive of influenza outbreaks [15, 17]. In this section, we evaluate whether sentiments and emotions extracted from Twitter communication has the potential to predict influenza dynamics across military geolocations. We apply cross-correlation analysis to assess the predictive power of affects on ILI proportions in our Global Military Dataset. This analysis enable us to identify which affects are seen before ILI (i.e., lead) and those that appear after ILI visits (i.e., lag). We report lead and lag intervals in weeks for every affects across 31 military geolocations in Figure 6.

Figure 6
figure 6

The lead and lag intervals in weeks for ILI-Affect cross-correlations across 31 military locations. Cells of different colors represent lagging (+) vs. leading (−) intervals between 0 and 4 (or more) weeks for statistically significant cross-correlation results (p-value ≤ 0.05, 95% confidence interval). Locations (columns) and affects (rows) are reordered by similarity using the Euclidean distance measure and are shown as a dendrogram.

We observed specific emotions and sentiments reliably lead ILI data and, therefore, are candidates for predicting ILI proportions (shown as blue and green in Figure 6 across geolocations) in lagged regression models for ILI disease forecasting. On the other hand, the affects that lag ILI data may be useful for nowcasting disease. Our key novel findings are the following:

  • Disgust emotion and all types of sentiment lead ILI proportions for most geolocations between 1 and 4 weeks.

  • Anger, surprise, fear, sadness, and joy emotions lead or lag ILI proportions, depending on geolocations, between 1 and 4 weeks.

By combining the 31 locations by U.S. region and separating them from the international locations, we are able to clearly visualize the resulting similarities in leading affects to ILI visit cross-correlation based on geolocation (Table 6). The U.S. is split into three regions, the south east (\(n = 13\)), south central (\(n = 4\)) and west coast (\(n = 5\)), and international (\(n = 6\)) as one location. Note: two single U.S. locations were unable to be grouped into a region based on geolocation. The differences in affects across locations are exemplified here as many regions have a high percent (80-100) of predictive status for specific affects that other locations have a low percent (0-40). The affects with consistent high percentages are neutral in all locations and joy in the U.S. and consistent low percentages across all locations is anger.

Table 6 Percent of locations that contain the affect as a potential predictive factor of ILI

4.2.3 Measuring affect differences in high vs. low ILI seasons across 31 locations

Figure 7 reports the highest mean numbers during high and low seasons for every emotion and sentiment type across 31 locations (i.e., the Global Military Dataset). Our key significant findings in terms of sentiment and emotion means are listed below.

  • Neutral sentiment and sadness emotion are higher during high ILI periods.

  • Positive sentiment, anger and surprise are higher during low ILI periods.

  • Negative sentiment and disgust, joy and fear emotion means vary across locations during high and low ILI periods.

Figure 7
figure 7

Emotion and sentiment differences in high vs. low ILI seasons across 31 military locations. We report the highest mean numbers (either the mean during high or low season) when the differences between high vs. low affect distributions are statistically significant with (p-value ≤ 0.05) for 31 locations. To control false positive discovery rate we apply Benjamini-Hochberg correction: represent Benjamini-Hochberg critical value with a false discovery rate of 0.2; and the rest of points - critical value for a false discovery rate of 0.1.

4.2.4 Predicting ILI dynamics from affects

We ran preliminary experiments on using affects (emotion and sentiment proportions) to predict location-specific ILI dynamics with machine learning models. We relied on regression models previously used with social media features for ILI dynamics prediction - AdaBoost (DecisionTree regressor), RandomForest, and Linear regressors [17] implemented in scikit-learn [51]. We report results for nowcasting - predicting current week % ILI in Table 7. We evaluate the predictive power of affects using several metrics (i.e., Pearson correlation (CORR), Root Mean Squared Error (RMSE), and Maximum Absolute Percent Error (MAPE)) between predicted and true ILI values as defined in [17]. We train our models on 2012-2013 years and predict for year 2014.

Table 7 ILI prediction results (current week estimates for %ILI) for 2014 for 31 locations from affect features using Linear, AdaBoost and Random Forest regressors

We found that Random Forest and Linear regressors yield higher performance (higher Pearson correlation and lower MAPE and RMSE) compared to AdaBoost models. We show how performance varies across locations. For 39% of locations Linear regressor demonstrates the highest performance, for 39% of locations Random Forest is the best, and the AdaBoost is the best for 23% of locations.

Our preliminary experiments demonstrate that affects are predictive (depending on location) but not sufficient to accurately predict ILI dynamics (as shown in Figure 8), especially if one wants to make forecasts several weeks in advance. However, if there is no historical ILI data available, affects could serve as predictors of ILI dynamics (Pearson correlation for locations i25, L4, L33, L37, L14, and L3 is between 0.5-0.58). We think that affects combined with other features extracted from tweets can boost model performance. Moreover, recently emerged deep neural networks for sequence prediction, e.g., Long Short-Term Memory models, can potentially boost predictions when learned jointly on text and affect signals. Future work may also include experimenting models that rely on affect and language features for forecasting ILI dynamics several weeks in advance.

Figure 8
figure 8

True vs. predicted %ILI (current week estimates) as a function of time for 31 geolocations. We plot true %ILI (ILI), predictions from sentiment and emotion features made using with AdaBoost (ABR), Linear (LR) and RandomForest (RFR) regressors.

5 Discussion

5.1 Variations in affects between military and non-military populations

Military populations are often thought of as small, closed communities that face different combinations of life stressors and challenges than the general public. In support of this concept, we show that emotions and sentiments expressed by military populations in their day-to-day social media discourse differ from surrounding non-military populations in the U.S. In general, military life consists of intense training and subsequent engagement in national security tasks, which put the lives of the military personnel or those that they work with in dangerous and/or chaotic situations. These trauma filled experiences can affect their families directly or indirectly. Twitter is a type of social media where communication is limited in length yet spread to a wide audience rapidly. With only 140 characters allowed, a tweet contains the core essence of what the tweeter is trying to get across, whether emotional or factual. Along these lines, tweets can provide insight into a person’s emotional and sentimental state. This study shows that military populations tend to contain more negative and less positive sentiments than control populations, in addition to increased emotions of sadness, fear, disgust and anger. These findings of more negative sentiments, including disgust and anger, by military and positive sentiments by the control population even hold true when specifically looking at high and low times of ILI. These results may be attributed to the hard facts of military life in general and the day-to-day challenges that they face in order to ensure protection of the civilian populations.

5.2 Correlations between ILI and affect time-series across global military locations

Studies in the literature have investigated the relationship between affects and health, many of which show that positive affects correlate with good health [18, 19]. In this paper, we examined the correlation between various affects identifiable in tweets and the health status of the subpopulation, containing those tweeters, based on the number of ILI visits to a health facility. We found that people within a 25-mile radius around military bases express sentiment and emotion in tweets that correlate with ILI visits to medical facilities in the same location. These correlations are mostly location and affect type dependent, which may suggest that location-specific demographics and characteristics may be influencing these differences. Different location specific aspects were investigated including the percent of military population to civilians, the volume of tweets collected, the ratio of ILI visits to tweet volume, and the service type of the bases within each location boundaries. The best generalizations were identified when the U.S. locations were split into southeastern, south central, and west coast. By doing this regionalization, correlations between affects and ILI visits became more apparent.

Although neutral sentiment was positively correlated with ILI visits throughout the U.S. (which means that populations are more neutral, in other words, express less opinionated tweets), positive and negative sentiments correlated differently to ILI visits regionally. In the Southeast, 77% of the military locations’ ILI visits were negatively correlated to negative sentiment. However, those locations that were positively correlated (23% of SE military) were all identified to be located with in a single state and military service type. The latter positive correlation between ILI and negative sentiment was also observed in the South Central and West Coast states with a high proportion of military to civilian populations. For positive sentiment, there were varying degrees of negative correlation to ILI visits, e.g., 54% of locations in the southeast and 75% of locations in the south central U.S. For the west coast, those locations with a higher percent of military to civilians showed a similar negative correlation between ILI and positive sentiment. In contrast, a single state with multiple locations and service types showed a positive correlation between positive sentiment and ILI visits. This degree of variability in correlations based on location stresses the importance of understanding which characteristics may play a role in the subpopulations reaction to health situations. As discussed by Gallo and Matthews (2003), the differences in location’s affects may be attributed to the socioeconomic status of the areas and the amount of stress that plays a role in the tweeter’s life [21].

The six emotions identified in tweets showed varying degrees of correlation with ILI visits. The most straight forward was a positive correlation with sadness found in all regions but with varying degrees of concurrence throughout each area, i.e. 60% of west coast location groups, 85% of the southeast, and 100% of the south central region. There was a negative correlation to anger sentiment in 100% of west coast locations and 75% in south central U.S. that contained the areas with the lowest percentage of military to civilian population. Other emotions showed demographic-specific results. Disgust correlated positively with military in the air force and negatively with those in the navy, specifically in the Southeast. Fear and disgust were highest when ILI visits were highest in areas dense with military personnel yet low when ILI was highest in areas with low density of military personnel. Joy and surprise were very dependent on location, percent of military population, and service type. For example, in the west coast locations, surprise was negatively correlated with ILI in 80% of the locations and joy was positively correlated in 80% of the locations. In the south central region, military percent played the biggest role with negative correlations between ILI visits and both surprise and joy in areas with high military to civilian populations along with positive correlation to surprise in areas with less military than civilians. Lastly, in the southeastern U.S., service type seems to play the most important role with 62% of the locations negatively correlated with surprise and all army service areas positively correlated. In the same respect, all negatively correlated joy emotions to ILI visits were observed in the army service in the same area.

In general for the locations studied here, there is a correlation between sentiments and emotions identified in tweets and ILI visits to health facilities within the same areas. The strongest correlations across all U.S. locations are a negative correlation to anger and surprise, positive correlation to sadness, and a mixed bag for fear, disgust, and joy. For sentiments, there was an overwhelming positive correlation to neutral statements, negative correlation to positive sentiments, and a mixed bag for negative sentiments correlating to ILI visits based on which region was analyzed. Very similar results were identified when we looked at affects most present during high vs. low ILI clinical visits across datasets, i.e., neutral sentiment and sadness were expressed most during high ILI times and positive sentiment, anger, and surprise are expressed during low periods. Interestingly, our findings support the psychology research that identifies positive affects correlating to good health. However, the converse isn’t naturally shown in our data. Instead, neutral sentiments, i.e., a decreased amount of strongly opinionated tweets, characterize the period of ill health, namely ILI, for the population. This lack of positive affects when ILI visits are high can be explained by Fredrickson’s paper, which emphasizes that positive emotions are not just the absence of negative emotions [20].

5.3 Cross-correlations between ILI and affect time-series across 31 military locations

By regressing affects 1 to 5 weeks before the ILI visit counts, we were able to identify which affects in tweets could be used to predict the number of ILI cases. The results were overall variable but when broken down regionally, we were able to identify specific attributes that may be responsible for these differences, e.g. location (i.e., country, region and/or state level), military service type, and percent of military per location. Neutral sentiment and joy emotion were the only affects that were consistent across all U.S. locations as strong candidates for predictive models of ILI visits whereas anger was constantly unreliable. In the southeast, there were strong trends by state, whereas in the west coast and south central areas, military service and the percent of the population being military played a role in predictive affects. In general, the south eastern and south central regions affects lead ILI visits more often than other studied locations and, therefore, may be a good starting point for predictive investigation of ILI visits.

6 Limitations

We note that our analysis and methodology have several limitations.

First, our social media data collection approach for identifying military versus control populations does not guarantee that the control population is not military. Also, the method used does not allow for any assumptions on the location of the Twitter user since the time series of tweets were collected based on user and not location. Second, our social media data collection approach does not allow us to claim that all tweets originating within a 25-mile radius of military locations have been produced by military personnel or their family members. Third, our affect classification models are capable of predicting affects in tweets with only a certain level of accuracy, which would bring some noise, e.g., mislabeled annotations, to our analysis. Fourth, using ILI location-specific weekly proportions as our gold standard for influenza dynamics in military populations may not necessarily be ideal. Finally, we only study correlations between affects and ILI clinical diagnosis data. As such, we can not make any causal inference regarding these parameters.

Despite these limitations, due to the size of analyzed dataset, we believe our conclusions regarding emotion and sentiment differences between military and control populations, ILI-Affect correlations, and cross-correlations are accurate.

7 Conclusion

We performed a novel large-scale study of automatically inferred emotions and sentiments emanating from 171 million tweets produced by military and non-military associated people across 25 locations in the U.S. and 6 international locations.

Through studying military and non-military associated Twitter communications, we found significant differences in their expressions of sentiments and emotions in social media throughout time. In general in control population communications, positive affects are expressed more and negative affects less than in military population communications during a year. This underlying theme may be a result of the different lifestyle and responsibilities endured between the military and civilian populations.

In addition, our analysis identified emotions and sentiments in tweets for a given location that correlate with the amount of military ILI visits to medical facilities in the same area regardless of whether or not the tweets are from military or non-military individuals. In general, positive affects in tweets correlate with less medical facility visits for ILI symptoms and neutral affects are used during times of increased ILI visits.

We then showed that by combing locations into regions, trends in the direction of correlation and the predictive usage of affects, become more apparent and easily identified. In specific instances, we have identified additional characteristics within the regions that help explain trends seen. These include similarities in emotions and sentiments expressed by all locations in a specific state, military service type, and/or locations within a region that have high or low ratios of military to civilian populations.

Our preliminary regression experiments on predicting ILI dynamics showed that affects are predictive but not sufficient to accurately generate current week ILI estimates. We found that predictive power of sentiments and emotions vary significantly across locations. Nevertheless, we think that affects combined with other features extracted from tweets, location-specific feature selection (e.g., our findings from cross-correlation analysis), as well as deep neural network models for sequence prediction can boost prediction accuracy, and even forecast ILI dynamics several weeks in advance.

Overall, the information gained in this study exemplifies a usage of social media data to understand the correlation between psychological behavior and health in the military population and the potential for use of social media affects for prediction of ILI cases.