Automatic detection of tweets reporting cases of influenza like illnesses in Australia

Zuccon, Guido; Khanna, Sankalp; Nguyen, Anthony; Boyle, Justin; Hamlet, Matthew; Cameron, Mark

doi:10.1186/2047-2501-3-S1-S4

Automatic detection of tweets reporting cases of influenza like illnesses in Australia

Research
Open access
Published: 24 February 2015

Volume 3, article number S4, (2015)
Cite this article

Download PDF

You have full access to this open access article

Health Information Science and Systems Aims and scope Submit manuscript

Automatic detection of tweets reporting cases of influenza like illnesses in Australia

Download PDF

Guido Zuccon^1,2,
Sankalp Khanna²,
Anthony Nguyen²,
Justin Boyle²,
Matthew Hamlet² &
…
Mark Cameron³

1982 Accesses
9 Citations
4 Altmetric
Explore all metrics

Abstract

Early detection of disease outbreaks is critical for disease spread control and management. In this work we investigate the suitability of statistical machine learning approaches to automatically detect Twitter messages (tweets) that are likely to report cases of possible influenza like illnesses (ILI). Empirical results obtained on a large set of tweets originating from the state of Victoria, Australia, in a 3.5 month period show evidence that machine learning classifiers are effective in identifying tweets that mention possible cases of ILI (up to 0.736 F-measure, i.e. the harmonic mean of precision and recall), regardless of the specific technique implemented by the classifier investigated in the study.

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter

Article Open access 16 October 2023

An improved machine learning technique for identify informative COVID-19 tweets

Article 07 July 2022

Disaster Analysis Through Tweets

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Early detection of disease outbreaks is a crucial capability for hospitals and public health officials to effectively allocate resources to control the spread of diseases and treat affected patients [1, 2]. In Australia, state agencies keep track of the number of patients tested positive for influenza and influenza like illnesses (ILI). National initiatives attempt to obtain timely reporting through measures such as the Australian Sentinel Practices Research Network (ASPREN) (http://www.aspren.com.au/, last visited October 13, 2014), the National Health Call Centre Network (http://www.health.gov.au/internet/main/publishing.nsf/Content/national-health-call-centre-network-team-overview, last visited October 13, 2014) and community level surveillance through FluTracking (http://www.flutracking.net/, last visited October 13, 2014). These systems, although enlarging the population base that is monitored, suffer poor participation rates [3] and high costs.

Figure 1 outlines the disease prevalence pyramid, where the width of each layer represents the population size involved or monitored. The benefits of expanding the data sources used to produce disease outbreak notifications to web data and social media [4], in particular Twitter, are numerous. The data is publicly available, its access cost is low, the participation rate is high (http://www.nielsen.com/us/en/insights/news/2010/australia-getting-more-social-online-as-facebook-leads-and-twitter-grows.html, last visited October 13, 2014) and the user base is generally broad, although not uniform with respect to age groups and geographic areas. By leveraging information published in Twitter, real time reporting across a large fraction of the population may be possible.

Previous studies that monitored Twitter in the US [5] and UK [6, 7] have found that it is possible to produce highly correlated predictions for influenza-affected patients from the use of Twitter alone. However, the use of Twitter is not without its problems. The volume of tweets is exceedingly large, with users producing over 200 million tweets per day globally as of mid 2011 (http://blog.twitter.com/2011/06/200-million-tweets-per-day.html, last visited October 13, 2014). The content of a message is highly condensed and often expressed differently than natural language due to the size limitation of a tweet (140 characters). To render the data useful for predictions, it must be collected and analysed in real time, and its manual processing may not be timely nor cost efficient.

This necessitates an automatic system that can classify tweets reporting influenza cases with high accuracy. Previous work by Collier and Doan [8] has shown evidence that Naive Bayes (NB) and Support Vector Machine (SVM) classifiers, informed by a limited set of textual features created by extracting only terms contained in an health ontology, were able to classify tweets with respect to common syndromic categories.

This paper presents a study of detecting mentions of influenza from Twitter messages originating from Victoria, Australia, which is characterised by a smaller and more geographically diverse population (higher population density and high-speed/large-bandwidth internet access in metropolitan areas, and low population density and low-speed/small-bandwidth internet access in rural and regional areas) than those studied in previous work. The paper reports a thorough evaluation of an array of machine learning approaches for identifying Twitter messages that may indicate cases of ILI. Correlation with confirmed influenza cases was not within the scope of this work. Investigated methods go beyond the two popular classifiers tested previously by others, i.e. Naive Bayes and Support Vector Machine, expanding the analysis to other learning approaches such as decision trees (C4.5/J4.8, Random Forests, Logistic Model Trees), and perceptrons and regression models (Voted Perceptron, Linear Logistic regression, Multinomial Logistic Regression). The results suggest that machine learning techniques are able to discriminate among tweets containing mentions of influenza or relevant symptoms and irrelevant messages. In addition, our experiments show that SVM classifiers do not always return the highest performance, and alternative approaches (e.g. Multinomial Logistic Regression and Random Forests) return higher performance under specific settings. However, the results also reveal that there is only limited differences in performance across the different types of classifiers. This suggests that future research efforts on the detection of influenza related tweets should be directed beyond improving machine learning techniques, in particular addressing how disease outbreak monitoring systems should cope with false positive notifications produced by the proposed automatic methods, as well as true influenza mentions that are not captured (i.e. false negatives).

Collection of Twitter messages and manual assessment

We obtained tweets posted in a 3.5 month period (May to August 2011), corresponding to the peak Australian flu season, all of which originated from users based in Victoria. This amounted to just over 13.5 million tweets. The tweets were captured using the ESA-AWTM architecture [9] that leverages the Twitter API, incorporating other services such as Yahoo! and Google maps to add extra metadata (e.g. location data). Initial analysis of these tweets revealed that around 0.1-0.2% of all 13.5 million tweets reported influenza cases [10]. In order to retain a significant amount of positive influenza reporting tweets from our data to train a classifier, but still be able to efficiently deploy computational methods, the Twitter stream was filtered to store only messages that contained keywords (and their derivatives) that may indicate cases of influenza. These keywords are listed in Table 1 and were selected by considering typical influenza symptoms as well as extending the keywords reported in previous research (e.g. Sadilek et al. [11] and Signorini et al. [12]). Note that re-tweets were removed. The application of this filtering process retained approximately 100,000 messages that were potentially influenza-related (0.75% of the initial data).

Table 1 Keywords that may indicate or exclude cases of influenza.

Full size table

From these approximately 100,000 tweets, a set of 10,483 tweets was randomly selected for manual classification to assess their likelihood of reporting a case of influenza. Seventeen volunteer assessors from The Australian e-Health Research Centre, CSIRO were asked to use a scale of 0-100 to select how likely they thought a tweet was representative of the user reporting a case of influenza (either with themselves or in others): 0 being no flu, 100 being certain of a flu. Figure 2 presents the results of this manual classification. The majority of filtered tweets (78.12%) were assessed as not related to ILI, although interestingly, 6.49% of tweets were assessed as certainly related to ILI.

In addition, 363 tweets, from the set of those that were manually classified, were assessed by multiple volunteers (three classifications per tweet on average), as an effort to measure inter-assessor agreement. The average standard deviation between the scores of tweets with multiple assessors was 4.89, indicating that the classification labels assigned by different assessors were comparable. Shorter tweets did have a higher standard deviation on average, as might be expected given that they contain less information. However, the differences between their scores were not judged large enough to require them to be treated differently. If a tweet was reviewed by more than one assessor, its average score was used for the remainder of the analysis.

Automatic flu classification: statistical machine learning classifiers

Problem definition

The problem of detecting ILI-related Twitter messages is casted into a binary classification problem: classify a tweet as being ILI-related or not. The collected ground truth indicates the likelihood of a tweet to be ILIrelated as a percentile score (i.e. between 0 and 100). Percentile scores were transformed into binary classes according to a threshold th which "defines an influenza related tweet". We refer to "definition of influenza related tweet" as the process of collapsing percentile scores assigned to tweets into a binary classification (i.e. ILI-related or not). Thus a "loose definition" corresponds to considering as influenza-related also tweets that have been assigned a relatively low score (e.g. 50). A "strict definition" instead corresponds to considering as influenza-related only tweets assigned with a high score (e.g. 100).

Classifiers

The machine learning classification methods evaluated in this study for the task of identifying influenzarelated tweets are listed below. We investigated classifiers from three wide families of machine learning approaches, namely linear classifiers, support vector machines and decision trees. The corresponding Weka version 3.6.7 [13] implementations of these classifiers were used in the empirical experiments.

Linear classifiers

standard Naive Bayes classifier.
Linear logistic regression classifier (SimpleLogistic in Weka).
multinomial logistic regression classifier with ridge estimator [14] (Logistic in Weka).
Voted Perceptron which takes advantage of data that is linearly separable with large margins.

Support Vector Machine (SVM) classifiers

Support Vector Machine (SMO in Weka) that uses a polynomial kernel and the sequential minimal optimization algorithm by Platt [15].
Support Vector Machine (SPegasos in Weka) that uses a linear kernel and a stochastic variant of the primal estimated sub-gradient solver method by Shalev-Shwartz et al. [16].

Decision trees

C4.5 Decision Tree learner (J48 in Weka) that builds a decision tree based on information entropy as measured on training data.
Random Forest, an ensemble classifier method that constructs multiple decision trees during the training phase.
Logistic model trees classifier (LMT in weka) where logistic regression functions are used at the leaves of the tree.

Details of the classification approaches and their implementation can be found in the Weka documentation; standard settings were used as defined in the software package.

Features

The Medtex text analysis software [17] was used to extract features from the free-text of Twitter messages. Medtex is a medical text analysis platform that has been used in previous studies on cancer reporting [18–20], radiology reconciliation [21], and medical information retrieval [22]. Medtex architecture is characterised by a messaging framework built on the concept of message queues, producers (senders), and consumers (receivers). Because multiple message consumers can be set up in parallel to receive messages from the same queue, Medtex provides high text analysis throughput, making it an ideal framework for analysing large streams of Twitter data. While some of the specific clinical text analytic capabilities of Medtex were not used in this study (e.g. SNOMED CT and UMLS concept extraction), we extended the platform to include information extraction capabilities for specific entities that are present in Twitter messages, such as the presence of Twitter usernames (e.g. @Username), hash-tags indicating specific topics (e.g. #Topic), and emoticons (e.g. :-) and ;().

The features extracted from tweets using the Medtex software include:

word tokens: strings identified by word boundaries such as white spaces and punctuation;
word stems: the root stems of the word tokens (if available); stems were extracted using the Porter stemmer algorithm [23];
word token n-grams: a continuous sequence of n word tokens in a tweet; we extracted both bi-grams and tri-grams (n = 2 and n = 3);
binary feature representing the presence of a http:// token, identifying that the tweet contains a link to a web page;
binary feature representing the presence of the token @ followed by a sequence of characters, identifying that the tweet has been directed to a Twitter user or presents a mention of that user;
binary feature representing the presence of hashtags, i.e. tokens that start with the symbol # used to mark keywords or topics in a tweet;
binary feature that represents the presence of a positive (negative) emoticon, i.e. a metacommunicative pictorial representation of a facial expression that conveys a positive emotion like happy, love, etc. (a list of positive and negative emoticons is given in Table 2);

Table 2 List of emoticons associated with positive and negative emotions.

Full size table

A total of 26,698 unique features formed the feature vocabulary for the entire set of annotated tweets used in the experiments reported in this article.

Experimental settings

To evaluate the effectiveness of the machine learning approaches investigated in this article, we set up an evaluation framework that consisted of a first set of experiments using the 10-fold cross-validation methodology and a subsequent set of experiments where the classification models learnt in the cross-fold experiments were validated on unseen data (i.e. data not used for creating the models).

To this aim, we first constructed a balanced dataset for cross-validation experiments, that contained an equal number of positive (influenza-related) and negative (not influenza-related) instances. Specifically, the dataset contained 90% of the positive instances (i.e. tweets that had been annotated as being influenzarelated) and an equal amount of negative instances. These instances were randomly sampled from the respective classes. This dataset was subsequently randomly partitioned in 10 folds, and for each iteration of the cross-validation algorithm a unique combination of 9 folds were used for learning a classification model and the excluded one was used for testing the obtained model.

A second dataset was then formed by combining the remaining 10% of positive instances with the remaining amount of negative instances: this dataset was used to validate on unseen data the models learnt through cross-validation.

The described procedure was iterated for each 'likelihood of influenza score' threshold level (th), i.e. datasets were constructed for each threshold value: datasets varied in size across threshold values, due to the difference in number of positive instances when considering strict (e.g. th = 99) or relaxed (e.g. th = 49) thresholds for defining an influenza-related tweet. The use of unseen data to validate the models created using n-fold cross validation further reduces risks that the obtained results are due to over-fitting.

Classification effectiveness

Effectiveness on 10-fold cross validation

Precision and recall values obtained by the studied classifiers in the 10-fold cross-validation experiments and with different threshold values th are detailed in Table 3 and the F-measure values are plotted in Figure 3. The F-measure summarises the precision-recall evaluation, being a balanced average of the two measures. Because the dataset used for the cross-validation experiments is balanced (same number of positive and negative instances), the two target classes (i.e. influenza and not-influenza) have equal importance. A majority class classifier then would achieve a maximum of 0.5 precision/recall/F-measure value. The confusion matrices for each setting of classifier and threshold value are reported in Table 4.

Table 3 Precision (prec) and recall (rec) values with respect to the 'likelihood of influenza' threshold level obtained by the studied classifiers when evaluated using 10-fold cross-validation.

Full size table

Table 4 Confusion Matrices for 10-fold cross-validation experiments.

Full size table

Figure 3 suggests that overall, all classifiers achieve better performance when a loose definition of influenza related tweets is used, i.e. when 49 ≤ th ≤ 74, with the best F-measure value achieved by multinomial logistic regression classifier (Logistic, 0.736 F-measure at th = 59). When stricter threshold values are used, then the F-measures of all classifiers decrease, this decrement occurring somewhere in the interval of threshold values between 74 and 84, with F-measures being overall stable between 84 and 99. The values of precision and recall (Table 3) report a similar finding, although losses in performance are different across precision and recall for different classifiers. For example, the Random Forests classifier exhibits a higher loss in precision than that in recall when passing from a threshold value of 49 to 99. Conversely, the Naive Bayes classifiers exhibits similar losses in performance across both precision and recall when considering threshold values of 49 and 99.

Figure 3 confirms findings of previous studies, that Support Vector Machine approaches are generally better than Naive Bayes in determining if a tweet is reporting ILI cases, e.g. [8]. However, our study reports on the performance of a wider range of classifiers. The empirical results show that there are a number of classifiers that guarantee performance that are usually bounded by those of SVMs and Naive Bayes, and exceed SVMs performance in specific circumstances. For example, while the multinomial logistic regression classifier (Logistic) generally achieves F-measures higher than Naive Bayes but lower than SVMs, it does improve over SVMs when the threshold value is 59. The multinomial logistic regression classifier in fact proves to be comparable to the best SVM approach (SMO - polynomial kernel and sequential minimal optimisation algorithm) when a relaxed definition of influenza is used to classify tweets. When a more strict definition of flu-related tweets is adopted, the performance of Logistic degrades, indicating poor robustness across threshold values of this logistic regression classifier for this task. A similar conclusion can be drawn for the other logistic regression approaches investigated in this study. In fact, the linear logistic regression classifier (denoted SimpleLogistic) produces F-measures comparable to that of SVMs when the threshold is set to 59, 89, 94; however it does perform poorly with other threshold values, i.e. 49, 74, 99.

We now examine the results reported in Table 4 i.e. the confusion matrices produced by each classifier in the cross-validation experiments. Confusion matrices provide a crude but finer-grain understanding of classifier performance than summary measures like F-measure. For each matrix in the table, the first row indicates the number of tweets that are influenza-related according to the ground truth annotations and their classification value according to the studied classifiers (left column: classified as influenza, i.e. true positive (TP) cases; right column: classified as non-influenza, i.e. false negative (FN) cases). Vice versa, the second row indicates the number of tweets that have been assessed as not reporting influenza cases; the leftmost value corresponds to non-influenza tweets that have been erroneously classified as being influenza-related false positive - FP), while the rightmost value corresponds to non-influenza tweets that have been correctly classified (i.e. true negative - TN).

If identifying more influenza-related tweets is of key importance, then the best classifier that achieves this is the one with the larger number of TP instances (or vice-versa, the lower number of FN). In Table 4 the highest TP value has been highlighted in bold for each threshold value. For low-mid threshold values (49 ≤ th ≤ 84), the multinomial logistic regression classifier (Logistic) returns the highest number of TP instances. For higher threshold values, the highest number of TP instances is returned by the voted perceptron (th = 89, 99) and SVMs classifiers (th = 94), although Logistic provides a very similar result.

If on the other hand producing a low amount of false positive influenza alerts is of key importance, then the most suitable classifier is the one that produces the lowest amount of FP instances (or vice-versa, the highest number of TN); this has been highlighted in italics in Table 4. While the Random Forests classifier produces the lowest number of FP instances at low and high threshold values (th = 49, 94, 99), not one classifier exhibits consistently lower FP instances for mid values of the threshold (59 ≤ th ≤ 89).

Effectiveness on unseen data

The results obtained by the classifiers when validated against unseen data are analysed next. Tables 5 and 6 report respectively the F-measure values and the confusion matrices produced by the classifiers.

Table 5 F-measure values with respect to the threshold level obtained by the studied classifiers when evaluated on unseen data.

Full size table

Table 6 Confusion Matrices obtained when testing on unseen data.

Full size table

The classifiers exhibit lower F-measures when validated on the unseen dataset than when tested in the cross-validation settings. This is because the dataset used for cross-validation was balanced across the two classes (same number of influenza-related and non influenza-related instances), while the dataset used in this second experiment is heavily imbalanced towards the negative class. This means that there are many more non-influenza tweets than the influenza ones: in fact, the percentage of influenza-related tweets in this dataset varies across the different threshold values and ranges between 2.22% for th = 49 and 0.74% for th = 99 (while in the balanced dataset was 50%). Nevertheless, the results confirm the observation made in the cross-validation settings that automatically classifying tweets under a loose definition of influenza is easier than under the strict settings, i.e. all classifiers obtain higher F-measures for low threshold values than for high threshold values. The SPegasos variation of SVM does however constitute an exception, as inconsistent F-measure values are measured across the range of threshold values; in particular, performance yielded at the lowest threshold are worse than that at any other threshold value. The values reported in the confusion matrices for SPegasos (Table 6) highlight that this classifier is unable to correctly identify a large percentage of positive instances (TP) while it correctly identifies non-influenza cases (TN) at a higher rate than other classifiers, therefore yielding often larger values of F-measure due to the imbalanced nature of the dataset.

The results discussed in the previous paragraph suggest that considering F-measure values may lead to performance underestimation: an error rate for negative instances has a proportional larger contribution than a similar error rate on positive instances. To avoid this, we calculate the balanced accuracy yielded by each classifier under the different threshold settings. Balanced accuracy Â (i.e. the average accuracy obtained on either class) is defined as [24]:

Â = \frac{0.5 * T P}{T P + F N} + \frac{0.5 * T N}{T N + F P}

(1)

When contrasted with the standard accuracy measure, balanced accuracy presents the advantage that Â is high if a classifier performs equally well on both classes, while Â is low when high (standard) accuracy is obtained only because the classifier is advantaged or penalised by an imbalanced dataset, like in this case. A majority class classifier (in this case a classifier that assigns every instance to the negative class) and a minority class classifier (all instances assigned to the positive class) will obtain a balanced accuracy equivalent to chance (i.e. 0.5). The balance accuracy obtained by the classifiers investigated in this study is reported in Table 7.

Table 7 Balanced accuracy values (Â) with respect to the threshold level obtained by the studied classifiers when evaluated on unseen data.

Full size table

Values of balanced accuracy generally decrease as the threshold values increase: this confirms the previous analysis. In addition, balance accuracy reveals that the SVM instance implemented by SPegasos performs just above chance across all threshold values. This suggests that the model learnt by SPegasos on the cross-validation data is poorly applicable to the unseen data contained in the second dataset. Performance of other classifiers do however scale on unseen data. The best values of balance accuracy across each threshold value are highlighted in bold in Table 7. The observation that the Naive Bayes classifiers constitutes a lower bound in classification performance in the cross-validation experiments is confirmed in this second experimental setting. The SMO implementation of SVM classifier is confirmed to provide consistently high performance. The finding observed under the cross-validation settings that the multinomial regression classifier Logistic performs similar to SMO for low threshold value while losing effectiveness for higher vales is confirmed in this experiment. Vice versa, in this second experiment the other linear regression classifier, SimpleLogistic, is found to provide similar results to SMO across the different threshold values.

Conclusions

In this paper we have investigated the performance of machine learning classifiers for the task of detecting Twitter messages that mention possible cases of influenza or ILI. Our experiments considered a number of standard textual features, such as word tokens, stems and n-grams; in addition, we did consider features that are specific to Twitter messages, such as the presence of Twitter usernames, hashtags (i.e. #String), URLs and emoticons. The creation and investigation of new alternative features is left for future research.

Previous studies have shown the effectiveness of SVMs over Naive Bayes classifiers. While our study confirms this result, we show that Naive Bayes's performance can often be considered as the lower bound of a wide range of alternative classifiers and that there are a number of classifiers that perform similarly (or better under specific settings) than SVMs. In particular, the instance of SVM with linear kernel and stochastic gradient descent (SPegasos) tested in our study showed limited robustness when tested on a heavy imbalanced unseen dataset, although confirming good performance on cross-validation experiments with balanced data.

Differences in performance between the cross-validation experiments and those on unseen data may highlight the importance of the training methodology used to form the classifiers, and in particular whether to balance or stratify the datasets used during the training and testing phases. Chai et al [25] found that classification methods trained, validated, and tested on balanced datasets overestimated classification performance when compared with testing on imbalanced (stratified) data. Similar results were found in our study, where classifiers' F-measure in the cross-validation experiments (with a balanced dataset) were higher than those achieved in the unseen dataset experiments (with an imbalanced dataset). To overcome this issue and present a meaningful analysis of the result obtained on unseen data, we used the balanced accuracy measure [24], that overcomes the issue of a biased classifier that has taken advantage of an imbalanced test set, like in the case for the SPegasos classifier in our experiments. We leave further investigation and analysis of training/testing methodology designs to future work.

Finally, the results also reveal that often there is only limited difference in performance across the different investigated classifiers. This suggests that future research efforts for the detection of ILI related tweets should be directed beyond improving machine learning techniques, in particular addressing how disease outbreak monitoring systems should cope with false positive notifications produced by the proposed automatic methods. Knowing the exact number of true ILI-related tweets may be not necessary in the settings of a disease outbreak monitoring system, as increases or decreases in trends of tweets classified as likely to be ILI-related may be sufficient to correctly suggest disease outbreaks. This hypothesis requires further investigation and is left for future work.

References

Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, Meeyai A, Iamsirithaworn S, Burke DS: Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005, 437 (7056): 209-214. 10.1038/nature04017.
Article CAS PubMed Google Scholar
Longini IM, Nizam A, Xu S, Ungchusak K, Hanshaoworakul W, Cummings DA, Halloran ME: Containing pandemic influenza at the source. Science. 2005, 309 (5737): 1083-1087. 10.1126/science.1115717.
Article CAS PubMed Google Scholar
Clothier HJ, Fielding JE, Kelly HA: An evaluation of the Australian Sentinel Practice Research Network (ASPREN) surveillance for influenza-like illness. Communicable diseases intelligence. 2005, 29 (3): 231-
Google Scholar
Hartley D, Nelson N, Arthur R, Barboza P, Collier N, Lightfoot N, Linge J, Goot E, Mawudeku A, Madoff L, et al: An overview of Internet biosurveillance. Clinical Microbiology and Infection. 2013
Google Scholar
Achrekar H, Gandhe A, Lazarus R, Yu SH, Liu B: Predicting flu trends using twitter data. Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on, IEEE. 2011, 702-707.
Chapter Google Scholar
Lampos V, Cristianini N: Tracking the flu pandemic by monitoring the social web. Cognitive Information Processing (CIP), 2010 2nd International Workshop on, IEEE. 2010, 411-416.
Chapter Google Scholar
Lampos V, Cristianini N: Nowcasting events from the social web with statistical learning. ACM Transactions on Intelligent Systems and Technology (TIST). 2012, 3 (4): 72-
Google Scholar
Collier N, Doan S: Syndromic classification of Twitter messages. Electronic Healthcare. 2012, Springer, 186-195.
Chapter Google Scholar
Yin J, Karimi S, Robinson B, Cameron M: ESA: emergency situation awareness via microbloggers. Proceedings of the 21st ACM international conference on Information and knowledge management, ACM. 2012, 2701-2703.
Google Scholar
Hamlet M, Zuccon G, Khanna S, Nguyen A, Boyle J, Cameron M: Monitoring social media for communicable disease surveillance: An Australian study. Big Data in Health and Biomedicine. 2013, 46-47.
Google Scholar
Sadilek A, Kautz HA, Silenzio V: Predicting Disease Transmission from Geo-Tagged Micro-Blog Data. AAAI. 2012
Google Scholar
Signorini A, Segre AM, Polgreen PM: The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PloS one. 2011, 6 (5): e19467-10.1371/journal.pone.0019467.
Article PubMed Central CAS PubMed Google Scholar
Witten IH, Frank E, Hall MA: Data Mining: Practical Machine Learning Tools and Techniques: Practical Machine Learning Tools and Techniques. 2011, Elsevier
Google Scholar
Le Cessie S, Van Houwelingen J: Ridge estimators in logistic regression. Applied statistics. 1992, 191-201.
Google Scholar
Platt JC: 12 Fast Training of Support Vector Machines using Sequential Minimal Optimization. 1999
Google Scholar
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A: Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming. 2011, 127: 3-30. 10.1007/s10107-010-0420-4.
Article Google Scholar
Nguyen AN, Lawley MJ, Hansen DP, Colquist S: A simple pipeline application for identifying and negating SNOMED clinical terminology in free text. HIC 2009: Proceedings; Frontiers of Health Informatics-Redefining Healthcare, National Convention Centre Canberra, 19-21 August 2009. 2009, Health Informatics Society of Australia (HISA), 188-
Google Scholar
Nguyen A, Moore J, Lawley M, Hansen D, Colquist S: Automatic extraction of cancer characteristics from free-text pathology reports for cancer notifications. Health Informatics Conference. 2011, 117-124.
Google Scholar
Nguyen A, Moore J, Zuccon G, Lawley M, Colquist S: Classification of pathology reports for cancer registry notifications. Health Informatics: Building a Healthcare Future Through Trusted Information-Selected Papers from the 20th Australian National Health Informatics Conference (Hic 2012). 2012, Ios PressInc, 178: 150-
Google Scholar
Butt L, Zuccon G, Nguyen A, Bergheim A, Grayson N: Classification of Cancer-related Death Certificates using Machine Learning. Australasian Medical Journal. 2013, 6 (5): 292-299. 10.4066/AMJ.2013.1654.
Article PubMed Central PubMed Google Scholar
Zuccon G, Wagholikar A, Nguyen A, Chu K, Martin S, Lai K, Greenslade J: Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology. AMIA Clinical Research Informatics. 2013
Google Scholar
Zuccon G, Koopman B, Nguyen A, Vickers D, Butt L: Exploiting medical hierarchies for concept-based information retrieval. Proceedings of the Seventeenth Australasian Document Computing Symposium, ACM. 2012, 111-114.
Chapter Google Scholar
Porter MF: An algorithm for suffix stripping. Program: electronic library and information systems. 1980, 14 (3): 130-137. 10.1108/eb046814.
Article Google Scholar
Brodersen KH, Ong CS, Stephan KE, Buhmann JM: The balanced accuracy and its posterior distribution. Pattern Recognition (ICPR), 2010 20th International Conference on, IEEE. 2010, 3121-3124.
Chapter Google Scholar
Chai KE, Anthony S, Coiera E, Magrabi F: Using statistical text classification to identify health information technology incidents. Journal of the American Medical Informatics Association. 2013
Google Scholar

Download references

Acknowledgements

The authors wish to acknowledge the Australian e-Health Research Centre staff for their collective efforts in annotating Twitter messages.

Declarations

The costs of publication associated to this article were funded by the Australian e-Health Research Centre.

This article has been published as part of Health Information Science and Systems Volume 3 Supplement 1, 2015: Proceedings of the Health Informatics Society of Australia Big Data Conference (HISA 2013). The full contents of the supplement are available online at http://www.hissjournal.com/supplements/3/S1/

Author information

Authors and Affiliations

Information Systems School, Queensland University of Technology, Brisbane, Australia
Guido Zuccon
The Australian e-Health Research Centre, CSIRO Digital Productivity Flagship, Brisbane, Australia
Guido Zuccon, Sankalp Khanna, Anthony Nguyen, Justin Boyle & Matthew Hamlet
CSIRO Digital Productivity Flagship, Canberra, Australia
Mark Cameron

Authors

Guido Zuccon
View author publications
You can also search for this author in PubMed Google Scholar
Sankalp Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Justin Boyle
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Hamlet
View author publications
You can also search for this author in PubMed Google Scholar
Mark Cameron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guido Zuccon.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

GZ, SK, AN and JB conceived the study and participated in its design and coordination of the research. GZ, SK, AN, JB and MH contributed to the design of the Twitter annotation tool, Twitter annotations, and the interpretation of the results. GZ and MH developed, modelled, and performed evaluations and statistical analysis. MC provided the Twitter data set generated by the ESA-AWTM software. All authors have contributed to the drafting of the manuscript and have read and approved the final version.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Zuccon, G., Khanna, S., Nguyen, A. et al. Automatic detection of tweets reporting cases of influenza like illnesses in Australia. Health Inf Sci Syst 3 (Suppl 1), S4 (2015). https://doi.org/10.1186/2047-2501-3-S1-S4

Download citation

Published: 24 February 2015
DOI: https://doi.org/10.1186/2047-2501-3-S1-S4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automatic detection of tweets reporting cases of influenza like illnesses in Australia

Abstract

Similar content being viewed by others

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter