Abstract
Social media outlets, such as Twitter, provide invaluable information for understanding the social and political climate surrounding particular issues. Millions of people who vary in age, social class, and political beliefs come together in conversation. However, this information poses challenges to making inferences from these tweets. Using the tweets from the 2016 U.S. Presidential campaign, one main research question is addressed in this work. That is, can accurate predictions be made detecting changes in a political candidate’s poll score trends utilizing tweets created during their campaign? The novelty of this work is that we formulate the problem as a multivariate time-series classification problem, which fits the temporal nature of tweets, rather than as a traditional attribute-based classification. Features that represent various aspects of support for (or against) a candidate are tracked on an hour-by-hour basis. Together these form multivariate time-series. One commonly used approach to this problem is based on the majority voting scheme. This method assumes the univariate time-series from different features have equal importance. To alleviate this issue a weighted shapelet transformation model is proposed. Extensive experiments on over 12 million tweets between November 2015 and January 2016 related to the four primary candidates (Bernie Sanders, Hillary Clinton, Donald Trump and Ted Cruz) indicate that the multivariate time-series approach outperforms traditional attribute-based approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bermingham, A., Smeaton, A.F.: On using Twitter to monitor political sentiment and predict election results. In: Sentiment Analysis where AI meets Psychology (SAAIP), p. 2 (2011)
Gayo-Avello, D.: A meta-analysis of state-of-the-art electoral prediction from Twitter data. Social Science Computer Review, pp. 649–679 (2013)
Ghalwash, M., Radosavljevic, V., Obradovic, Z.: Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 402–411 (2014)
Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 392–401. ACM (2014)
Graham, T., Jackson, D., Broersma, M.: New platform, old habits? Candidates use of Twitter during the 2010 British and Dutch general election campaigns. New Media Soc. 18(5), 765–783 (2016)
Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 28(4), 851–881 (2014)
Larsson, A.O., Moe, H.: Studying political microblogging: Twitter users in the 2010 Swedish election campaign. New Media Soc. 14, 729–747 (2012)
Mueen, A., Keogh, E., Young, N.: Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 1154–1162 (2011)
O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From Tweets to polls: linking text sentiment to public opinion time series. ICWSM 11(122–129), 1–2 (2010)
Patri, O.P., Sharma, A.B., Chen, H., Jiang, G., Panangadan, A.V., Prasanna, V.K.: Extracting discriminative shapelets from heterogeneous sensor data. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, 27–30 October 2014, pp. 1095–1104 (2014)
Roychoudhury, S., Ghalwash, M.F., Obradovic, Z.: False alarm suppression in early prediction of cardiac arrhythmia. In: 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–6 (2015)
Sang, E.T.K., Bos, J.: Predicting the 2011 Dutch senate election results with Twitter. In: Proceedings of the Workshop on Semantic Analysis in Social Media, pp. 53–60. Association for Computational Linguistics (2012)
Shi, L., Agarwal, N., Agrawal, A., Garg, R., Spoelstra, J.: Predicting us primary elections with Twitter (2012)
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inform. Sci. Technol. 63(1), 163–173 (2012)
Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: what 140 characters reveal about political sentiment. ICWSM 10, 178–185 (2010)
Xing, Z., Pei, J., Yu, P.S., Wang, K.: Extracting interpretable features for early classification on time series. In: SIAM International Conference on Data Mining, pp. 247–258 (2011)
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 947–956. ACM (2009)
Acknowledgments
This research was supported in part by NSF BIGDATA grant 14476570 and ONR grant N00014-15-1-2729.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Learning Time-Series Classification Model (LTS)
LTS [4] is one of the state-of-the-art univariate time-series classification models. The method discovers short time-series sub-sequences known as shapelets [17], which are local discriminative patterns (or sub-sequences) that can be used to characterize the target class, for determining the time-series class membership. In the LTS model, shapelets are learned jointly with a linear classifier rather than searching over all possible time-series segments. More specifically, the algorithm jointly learns the weights of the classifier hyper-plane as well as the generalized shapelets.
A shapelet of length W is a sub-sequence of an instance of the time-series. There can be at most \( L~-~W~+~1\) sub-sequences, and each can be represented as \(\{f^q_{i,j},...,f^q_{i,j+W-1}\}\). K shapelets are initialized using K-Means centroid of all segments.
Equation 4 represents a linear model, where \(M_{i,k}\) is the minimum distance between the i-th series in \(T^q\) and the k-th shapelet \(S^q_k\).
The minimum distance \(M_{i,k}\) is the predictor in this framework for shapelet learning and can be defined by a soft-minimum function:
where \(D_{i,k,j}\) is defined as the distance between the \(j^{th}\) segment of series i and the \(k^{th}\) shapelet given by the formula
Equation 7 shows the regularized objective function, composed of a logistic loss defined by Eq. 8 and the regularization terms.
Equation 7 is optimized using a stochastic gradient descent algorithm. The weights \(\beta \) and the shapelet \(S^q\) are jointly learned to minimize the objective function. Once the model is learned, classifying an unknown instance is simply computing \(\hat{Y^q_t}\) for the t-th test instance of the q-th feature and determining the class label via Eq. 9
where \(\sigma (\cdot )\) denotes the sigmoid function.
For more details about individual gradient computation of the objective function, the reader is referred to [4].
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Mirowski, T., Roychoudhury, S., Zhou, F., Obradovic, Z. (2016). Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-47880-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)