Skip to main content

Going Back in Time to Predict the Future - The Complex Role of the Data Collection Period in Social Media Analytics

Abstract

In the context of events that involve public voting, such as televised competitions or elections, it has increasingly been recognized that communication data from social media is related to the outcome. Existing studies mainly analyse the number of messages and their sentiment, yet the role of different data collection periods has not been examined sufficiently. We collected Twitter data in 2015 and 2016 to examine the relationship between the audience voting of the Eurovision Song Contest and predictors based on quantity and emotions, and compared the results of using data from before and during the event. We found that the choice of time period greatly affected the results obtained. Data collected prior to the event exhibited a much stronger association with the final ranking than data collected during the event. In addition, the model based on pre-event data in 2015 showed considerable accuracy in predicting the 2016 results, illustrating the usefulness of social media data for predicting the outcomes of events outside social media.

This is a preview of subscription content, access via your institution.

Notes

  1. 1.

    A search for a word such as “eurovision” also returns all tweets that use the word as a hashtag, i.e. “#eurovision”.

  2. 2.

    https://www.eurovision.tv/page/results Accessed 29 November 2017.

References

  1. Archak, N., Ghose, A., & Ipeirotis, P. G. (2011). Deriving the Pricing Power of Product Features by Mining Consumer Reviews. Management Science, 57(8), 1485–1509.

    Google Scholar 

  2. Asur, S., & Huberman, B.A. (2010). Predicting the future with social media. In Proceedings of the Web Intelligence and Intelligent Agent Technology 2010 IEEE/WIC/ACM International Conference, 492–499.

  3. Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., & Tesconi, M. (2018). CrisMap: a Big Data Crisis Mapping System Based on Damage Detection and Geoparsing. Information Systems Frontiers, 1–19.

  4. Baumeister, R. F., & Tice, D. M. (1984). Role of self-presentation and choice in cognitive dissonance under forced compliance: Necessary or sufficient causes? Journal of Personality and Social Psychology, 46, 5–13.

    Google Scholar 

  5. Benthaus, J., & Skodda, C. (2015). Investigating Consumer Information Search Behavior and Consumer Emotions to Improve Sales Forecasting. In Proceedings of the 21stAmericas Conference on Information Systems, Puerto Rico.

  6. Berger, J. (2011). Arousal Increases Social Transmission of Information. Psychological Science, 22, 891–893.

    Google Scholar 

  7. Berger, J., & Milkman, K. L. (2012). What makes online content viral? Journal of Marketing Research, 49, 192–205.

    Google Scholar 

  8. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2, 1–8.

    Google Scholar 

  9. Bruns, A., & Burgess, J.E. (2011). The use of Twitter hashtags in the formation of ad hoc publics. In Proceedings of the 6thEuropean Consortium for Political Research (ECPR) General Conference 2011.

  10. Burnap, P., Gibson, R., Sloan, L., Southern, R., & Williams. (2016). M.140 characters to victory?: Using Twitter to predict the UK 2015 General Election. Electoral Studies, 41, 230–233.

    Google Scholar 

  11. Calderon, N.A., Arias-Hernandez, R., & Fisher, B. (2014). Studying Animation for Real-Time Visual Analytics: A Design Study of Social Media Analytics in Emergency Management. In Proceedings of the 47th Hawaii International Conference on System Sciences (HICSS) 2014 (pp. 1364-73). IEEE.

  12. Ceron, A., Curini, L., Iacus, S., & Porro, G. (2014). Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media & Society, 16, 340–358.

    Google Scholar 

  13. Charles, C. A. D., & Reid, G. (2016). Forecasting the 2016 General Election in Jamaica. Commonwealth & Comparative Politics, 54(4), 449–477.

    Google Scholar 

  14. Chen, J., Chen, H., Wu, Z., Hu, D., & Pan, J. Z. (2016). Forecasting smog-related health hazard based on social media and physical sensor. Information Systems, 64, 281–291.

    Google Scholar 

  15. Cheng, Z., Dimoka, A., & Pavlou, P. (2016). Context may be King, but generalizability is the Emperor! Journal of Information Technology, 31, 257–264.

    Google Scholar 

  16. Cheong, M., & Lee, V. C. S. (2011). A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Information Systems Frontiers, 13(1), 45–59.

    Google Scholar 

  17. Chinnov, A., Kerschke, P., Meske, C., Stieglitz, S., & Trautmann, H. (2015). An Overview of Topic Discovery in Twitter Communication through Social Media Analytics. In Proceedings of the 21stAmericas Conference on Information Systems, Puerto Rico.

  18. Chintagunta, P. K., Gopinath, S., & Venkataraman, S. (2010). The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets. Marketing Science, 29, 944–957.

    Google Scholar 

  19. Ciulla, F., Mocanu, D., Baronchelli, A., Gonçalves, B., Perraand, N., & Vespignani, A. (2012). Beating the news using social media: the case study of American Idol. EPJ Data Science, 1(8), 1–11.

    Google Scholar 

  20. Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial. Communications of the Association for Information Systems, 39.

  21. DiGrazia, J., McKelvey, K., Bollen, J., & Rojas, F. (2013). More tweets, more votes: Social media as a quantitative indicator of political behavior. PLoS One, 8.

  22. Dijkmans, C., Kerkhof, P., & Beukeboom, C. J. (2015). A stage to engage: Social media use and corporate reputation. Tourism Management, 47, 58–67.

    Google Scholar 

  23. Eriksson, M., & Olsson, E. K. (2016). Facebook and Twitter in Crisis Communication: A Comparative Study of Crisis Communication Professionals and Citizens. Journal of Contingencies and Crisis Management, 24(4), 198–208.

    Google Scholar 

  24. Festinger, L., & Carlsmith, M. (1959). Cognitive consequences of forced compliance. The Journal of Abnormal and Social Psychology, 58, 203–210.

    Google Scholar 

  25. Fox, E. (2008). Emotion Science: Cognitive and Neuroscientific Approaches to Understanding Human Emotions. Basingstoke: Palgrave Macmillan.

  26. Gayo-Avello, D., Metaxas, P. T., & Mustafaraj, E. (2011). Limits of electoral predictions using twitter. In Proceedings of the Fifth International Conference on Weblogs and Social Media (ICWSM-2011) (pp. 490–493). Menlo Park: The AAAI Press.

    Google Scholar 

  27. Georgiou, M. (2008). In the end, Germany will always resort to hot pants: watching Europe singing, constructing the stereotype. Popular Communication, 6, 141–154.

    Google Scholar 

  28. Golbeck, J., Gerhard, J., O’Colman, F., & O’Colman, R. (2017). Scaling Up Integrated Structural and Content-Based Network Analysis. Information Systems Frontiers, 1–12.

  29. Greene, D., & Cunningham, P. (2013). Producing a unified graph representation from multiple social network views. In Proceedings of the 5thAnnual ACM Web Science Conference, 118–121.

  30. Gruber, D. A., Smerek, R. E., Thomas-Hunt, M. C., & James, E. H. (2015). The real-time power of Twitter: Crisis management and leadership in an age of social media. Business Horizons, 58, 163–172.

    Google Scholar 

  31. Harrell, F. E. (2015). Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Cham: Springer.

  32. Harrell, F. E. (2017). rms: Regression Modeling Strategies. https://CRAN.R-project.org/package=rms Accessed 29 November 2017.

  33. He, W., Guo, L., Shen, J., & Akula, V. (2016). Social Media-Based Forecasting: A Case Study of Tweets and Stock Prices in the Financial Services Industry. Journal of Organizational and End User Computing, 28, 74–91.

    Google Scholar 

  34. Highfield, T., Harrington, S., & Bruns, A. (2013). Twitter as a technology for audiencing and fandom: The #Eurovision phenomenon. Information. Communications Society, 16, 315–339.

    Google Scholar 

  35. Huberty, M. (2015). Can we vote with our tweet? On the perennial difficulty of election forecasting with social media. International Journal of Forecasting, 31, 992–1007.

    Google Scholar 

  36. Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing Social Media Messages in Mass Emergency: A Survey. ACM Computing Surveys, 47.

  37. Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8).

  38. Ioannidis, J. P. A. (2014). How to Make More Published Research True. PLoS Medicine, 11(10).

  39. Jansen, H. J., & Koop, R. (2006). Pundits, ideologues, and the ranters: The British Columbia election online. Canadian Journal of Communication, 30(4), 613–632.

    Google Scholar 

  40. Jungherr, A., Jürgens, P., & Schoen, H. (2012). Why the pirate party won the german election of 2009 or the trouble with predictions: A response to Tumasjan, A., Sprenger, T.O., Sander, P.G., & Welpe, I.M. “predicting elections with twitter: What 140 characters reveal about political sentiment”. Social Science Computer Review, 30, 229–234.

  41. Kaschesky, M., Sobkowicz, P., Hernandez Lobato, J. M., Bouchard, G., Archambeau, C., Scharioth, N., et al. (2013). Bringing Representativeness into Social Media Monitoring and Analysis. In Proceedings of the 2013 46th Hawaii International Conference on System Sciences (HICSS). IEEE.

  42. Kaya, M., & Conley, S. (2016). Comparison of sentiment lexicon development techniques for event prediction. Social Network Analysis and Mining, 6, 1–13.

    Google Scholar 

  43. Kim, J.W., Kim, D., Keegan, B., Kim, J.H., Kim, S., & Oh, A. (2015). Social Media Dynamics of Global Co-presence During the 2014 FIFA World Cup. In Proceedings of the 33rdAnnual ACM Conference on Human Factors in Computing Systems.

  44. Kinsinger, E. A., & Schacter, D. L. (2008). Memory and emotion. In M. Lewis, J. A. Haviland-Jones, & L. Feldman Barrett (Eds.), Handbook of Emotions (pp. 601–617). New York: The Guildford Press.

    Google Scholar 

  45. Larosiliere, G., Carter, L., & Meske, C. (2017). How does the world connect? Exploring the global diffusion of social network sites. Journal of the Association for Information Science and Technology, 68(8), 1875–1885.

    Google Scholar 

  46. Li, J., & Cardie, C. (2013). Early Stage Influenza Detection from Twitter. arXiv:1309.7340 [cs.SI]. https://arxiv.org/abs/1309.7340.

  47. Li, X., & Hitt, L. M. (2008). Self-Selection and Information Role of Online Product Reviews. Information Systems Research, 19(4), 456–474.

    Google Scholar 

  48. Li, E. Y., Tung, C., & Chang, S. (2016). The wisdom of crowds in action: Forecasting epidemic diseases with a web-based prediction market system. International Journal of Medical Informatics, 92, 35–43.

    Google Scholar 

  49. Liu, Y. (2006). Word of Mouth for Movies: Its Dynamics and Impact of Box Office Revenue. Journal of Marketing, 70(July), 74–89.

    Google Scholar 

  50. Maldonado, M., & Sierra, V. (2015). Can social media predict voter intention in elections? The case of the 2012 Dominican Republic Presidential Election. In Proceedings of the 21stAmericas Conference on Information Systems, Puerto Rico.

  51. Marwick, A. E., & boyd, D. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society, 13, 114–133.

    Google Scholar 

  52. Mehndiratta, P., Sachdeva, S., Sachdeva, P., & Sehgal, Y. (2014). Elections Again, Twitter May Help!!! A Large Scale Study for Predicting Election Results Using Twitter. In Srinivasa S., Mehta S. (eds) Big Data Analytics. BDA 2014. Lecture Notes in Computer Science, vol 8883 (pp. 133-144). Cham: Springer.

  53. Mishne, G., & Glance, N.S. (2006). Predicting Movie Sales from Blogger Sentiment. In Proceedings of the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, 155-158.

  54. Mousavizadeh, M., Koohikamali, M., & Salehan, M. (2015). The Effect of Central and Peripheral Cues on Online Review Helpfulness: A Comparison between Functional and Expressive Products. In Proceedings of the 36thInternational Conference on Information Systems, Fort Worth.

  55. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., & Stoyanov, V. (2016). Semeval-2016 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 10thInternational Workshop on Semantic Evaluation, San Diego, US.

  56. Nann, S., Krauss, J., & Schroder, D. (2013). Predictive Analytics On Public Data-The Case Of Stock Markets. In Proceedings of the 21stEuropean Conference on Information Systems, Utrecht, Netherlands (Vol. 102, pp. 1–12).

    Google Scholar 

  57. Nguyen, T., Shiraia, K., & Velcin, J. (2015). Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications, 42, 9603–9611.

    Google Scholar 

  58. Nofer, M., & Hinz, O. (2015). Using Twitter to predict the Stock Market-Where is the mood effect? Business & Information Systems Engineering, 57, 229–242.

    Google Scholar 

  59. Nosek, B. A., et al. (2015). Estimating the reproducibility of psychological science. Science, 349. https://doi.org/10.1126/science.aac4716.

  60. Nulty, P., Theocharis, Y., Popa, S. A., Parnet, O., & Benoit, K. (2016). Social media and political communication in the 2014 elections to the European Parliament. Electoral Studies, 44, 429–444.

    Google Scholar 

  61. O’Connor, B., Balasubramanyan, R., & Routledge, B. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM-2010) (pp. 122–129). Menlo Park: The AAAI Press.

    Google Scholar 

  62. Oh, O., Agrawal, M., & Rao, R. (2013). Community Intelligence and Social Media Services: A Rumor Theoretic Analysis of Tweets During Social Crises. MIS Quarterly, 37(2), 407–426.

    Google Scholar 

  63. Pavlyshenko, B. (2013). Forecasting of events by Tweet data Mining. arXiv:1310.3499 [cs.SI]. https://arxiv.org/abs/1310.3499.

  64. Pfitzner, R., Garas, A., & Schweitzer, F. (2012). Emotional Divergence Influences Information Spreading in Twitter. Proceedings of the Sixth International Conference on Weblogs and Social Media (ICWSM-2012) (pp. 543-546). Menlo Park, CA: The AAAI Press.

  65. Pond, P. (2016). The space between us: Twitter and crisis communication. International Journal of Disaster Resilience in the Built Environment, 7(1), 40–48.

    Google Scholar 

  66. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ Accessed 29 November 2017.

  67. Rosario, A. B., Sotgiu, F., De Valck, K., & Bijmolt, T. H. A. (2016). The Effect of Electronic Word of Mouth on Sales: A Meta-Analytic Review of Platform, Product, and Metric Factors. Journal of Marketing Research, 53, 297–318.

    Google Scholar 

  68. Rudra, K., Sharma, A., Ganguly, N., & Imran, M. (2018). Classifying and Summarizing Information from Microblogs During Epidemics. Information Systems Frontiers, 1–16.

  69. Ruths, D., & Pfeffer, J. (2014). Social media for Large Studies of Behavior. Science, 346, 1063–1064.

    Google Scholar 

  70. Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19thInternational Conference on World Wide Web, 851–860.

  71. Sang, E. T. K., & Bos, J. (2012). Predicting the 2011 dutch senate election results with twitter. In Proceedings of the Workshop on Semantic Analysis in Social Media, 53–60.

  72. Schlenker, B. R., & Goldman, H. J. (1982). Attitude change as a self-presentation tactic following attitude consistent behavior: Effects of role and choice. Social Psychology Quarterly, 45(2), 92–99.

    Google Scholar 

  73. Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100.

    Google Scholar 

  74. Schoen, H., Gayo-Avello, D., Takis Metaxas, P., Mustafaraj, E., Strohmaier, M., & Gloor, P. (2013). The power of prediction with social media. Internet Research, 23, 528–543.

    Google Scholar 

  75. Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. MIS Quarterly, 35, 553–572.

    Google Scholar 

  76. Spence, P. R., Sellnow-Richmond, D. D., Sellnow, T. L., & Lachlan, K. A. (2016). Social media and corporate reputation during crises: the viability of video-sharing websites for providing counter-messages to traditional broadcast news. Journal of Applied Communication Research, 44(3), 199–215.

    Google Scholar 

  77. Statista (2017). Number of monthly active Twitter users worldwide from 1st quarter 2010 to 3rd quarter 2017 (in millions). https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users Accessed 29 November 2017.

  78. Stieglitz, S., & Dang-Xuan, L. (2013). Emotions and information diffusion in social media—sentiment of microblogs and sharing behavior. Journal of Management Information Systems, 29, 217–248.

    Google Scholar 

  79. Stieglitz, S., Bruns, A., & Krüger, N. (2014). Social Media Analytics: An Interdisciplinary Approach and Its Implications for Information Systems. Business & Information Systems Engineering, 6(2), 89–96.

    Google Scholar 

  80. Stieglitz, S., Bunker, D., Mirbabaie, M., & Ehnis, C. (2017). Sense-Making in Social Media During Extreme Events. Journal of Contingencies and Crisis Management, 26(1), 4–15.

    Google Scholar 

  81. Stieglitz, S., Mirbabaie, M., & Milde, M. (2018a). Social Positions and Collective Sense-making in Crisis Communication. International Journal of Human-Computer Interaction, 34(4), 328–355.

    Google Scholar 

  82. Stieglitz, S., Mirbabaie, M., Ross, B., & Neuberger, C. (2018b). Social Media Analytics – Challenges in Topic Discovery, Data Collection, and Data Preparation. International Journal of Information Management, 39, 156–168.

    Google Scholar 

  83. Storvik-Green, S. (2015a). #Eurovision Twitter hashflags go live! eurovision.tv. https://eurovision.tv/story/eurovision-twitter-hashflags-golive. Accessed 04 Jun 2018.

  84. Storvik-Green, S. (2015b). Nearly 200 million people watch Eurovision 2015. eurovision.tv. https://eurovision.tv/story/nearly-200-million-people-watcheurovision-2015. Accessed 04 Jun 2018.

  85. Taylor, B., Miller, E., Farringtion, C. P., Petropoulos, M., Favot-Mayaud, I., Li, J., et al. (1999). Autism and measles, mumps, and rubella vaccine: no epidemiological evidence for a causal association. The Lancet, 353, 2026–2029.

    Google Scholar 

  86. Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter Events. Journal of the American Society for Information Science and Technology, 62, 406–418.

    Google Scholar 

  87. Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM-2010) (pp. 178–185). Menlo Park: The AAAI Press.

    Google Scholar 

  88. Vaccari, C., Chadwick, A., & O’Loughlin, B. (2015). Dual Screening the Political: Media Events, Social Media, and Citizen Engagement. Journal of Communication, 65, 1041–1061.

    Google Scholar 

  89. Valenzuela, S., & Bachmann, I. (2015). Pride, Anger, and Cross-cutting Talk: A Three-Country Study of Emotions and Disagreement in Informal Political Discussions. International Journal of Public Opinion Research, 27(4), 544–564.

    Google Scholar 

  90. Wahyudi, A., Kuk, G., & Janssen, M. (2018). A Process Pattern Model for Tackling and Improving Big Data Quality. Information Systems Frontiers, 1–13.

  91. Williams, C., & Gulati, G. (2008). What is a social network worth? Facebook and vote share in the 2008 presidential primaries. In Annual Meeting of the American Political Science Association, Boston, MA.

  92. Wu, J., Srite, M., & Deng, S. (2016). Tweet, Favorite, and Envy. In Proceedings of the 22ndAmericas Conference on Information Systems, San Diego.

  93. Yu, S., & Kak, S. (2012). A survey of prediction using social media. arXiv:1203.1647 [cs.SI]. https://arxiv.org/abs/1203.1647.

  94. Zhang, X., Fuehres, H., & Gloor, P. A. (2011). Predicting stock market indicators through twitter “I hope it is not as bad as I fear”. Procedia–Social and Behavioral Sciences, 26, 55–62.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stefan Stieglitz.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Stieglitz, S., Meske, C., Ross, B. et al. Going Back in Time to Predict the Future - The Complex Role of the Data Collection Period in Social Media Analytics. Inf Syst Front 22, 395–409 (2020). https://doi.org/10.1007/s10796-018-9867-2

Download citation

Keywords

  • Social media analytics
  • Time period
  • Predictive analytics
  • Eurovision
  • Twitter