# Determining the Veracity of Rumours on Twitter

## Abstract

While social networks can provide an ideal platform for up-to-date information from individuals across the world, it has also proved to be a place where rumours fester and accidental or deliberate misinformation often emerges. In this article, we aim to support the task of making sense from social media data, and specifically, seek to build an autonomous message-classifier that filters relevant and trustworthy information from Twitter. For our work, we collected about 100 million public tweets, including users’ past tweets, from which we identified 72 rumours (41 true, 31 false). We considered over 80 trustworthiness measures including the authors’ profile and past behaviour, the social network connections (graphs), and the content of tweets themselves. We ran modern machine-learning classifiers over those measures to produce trustworthiness scores at various time windows from the outbreak of the rumour. Such time-windows were key as they allowed useful insight into the progression of the rumours. From our findings, we identified that our model was significantly more accurate than similar studies in the literature. We also identified critical attributes of the data that give rise to the trustworthiness scores assigned. Finally we developed a software demonstration that provides a visual user interface to allow the user to examine the analysis.

## Keywords

Logistic Regression Propagation Graph Random Forest Model Decision Tree Algorithm Trustworthiness Score## Notes

### Acknowledgements

This work was partly supported by UK Defence Science and Technology Labs under Centre for Defence Enterprise grant CDE42008. We thank Andrew Middleton for his helpful comments during the project. We would also like to thank Nathaniel Charlton and Matthew Edgington for their assistance in collecting and preprocessing part of the data.

## Supplementary material

## References

- 1.Cambridge Advanced Learner’s Dictionary and Thesaurus. Cambridge University Press. http://dictionary.cambridge.org/dictionary/english/rumour
- 2.Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
- 3.Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Proceedings of the 20th International conference on World wide web, pp. 675–684. ACM (2011)Google Scholar
- 4.Castillo, C., Mendoza, M., Poblete, B.: Predicting information credibility in time-sensitive social media. Internet Res.
**23**(5), 560–588 (2013)CrossRefGoogle Scholar - 5.Pennebaker, J.W., Booth, R.J., Boyd, R.L., Francis, M.E.: Linguistic Inquiry and Word Count: LIWC 2015. Pennebaker Conglomerates, Austin (2015). www.LIWC.net Google Scholar
- 6.Finn, S., Metaxas, T.P., Mustafraj, E.: Investigating rumor propagation with TwitterTrails. arXiv:1411.3550 (2014)
- 7.Fox, J.: Applied Regression Analysis, Linear Models, and Related Methods. Sage Publications, London (1997)Google Scholar
- 8.Gil, Y., Artz, D.: Towards content trust of web resources. Web Semant. Sci. Serv. Agents World Wide Web
**5**(4), 227–239 (2007)CrossRefGoogle Scholar - 9.Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res.
**3**, 1157–1182 (2003)zbMATHGoogle Scholar - 10.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)CrossRefzbMATHGoogle Scholar
- 11.Kelton, K., Fleischmann, K., Wallace, W.: Trust in digital information. J. Am. Soc. Inf. Sci. Technol.
**59**(3), 363–374 (2008)CrossRefGoogle Scholar - 12.Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge (2009)zbMATHGoogle Scholar
- 13.Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media. In 2013 IEEE 13th International Conference on Data Mining, pp. 1103–1108. IEEE (2013)Google Scholar
- 14.Lomax, G.R., Hahs-Vaughn, D.L.: An Introduction to Statistical Concepts. Routledge, New York (2012)Google Scholar
- 15.Lukyanenko, R., Parsons, J.: Information quality research challenge: adapting information quality principles to user-generated content. J. Data Inf. Qual. (JDIQ)
**6**(1), 3 (2015)Google Scholar - 16.Mai, J.: The quality and qualities of information. J. Am. Soc. Inf. Sci. Technol.
**64**(4), 675–688 (2013)CrossRefGoogle Scholar - 17.Mendoza, M., Poblete, B., Castillo, C.: Twitter under crisis: can we trust what we RT? In: Proceedings of the First Workshop on Social Media Analytics, pp. 71–79. ACM, New York (2010)Google Scholar
- 18.Nurse, J.R.C., Agrafiotis, I., Goldsmith, M., Creese, S., Lamberts, K.: Two sides of the coin: measuring and communicating the trustworthiness of online information. J. Trust Manag.
**1**(5), 1–20 (2014). doi: 10.1186/2196-064X-1-5 Google Scholar - 19.Nurse, J.R.C., Creese, S., Goldsmith, M., Rahman, S.S.: Supporting human decision-making online using information-trustworthiness metrics. In: Marinos, L., Askoxylakis, I. (eds.) HAS 2013. LNCS, vol. 8030, pp. 316–325. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39345-7_33 CrossRefGoogle Scholar
- 20.Nurse, J.R.C., Rahman, S.S., Creese, S., Goldsmith, M., Lamberts, K.: Information quality and trustworthiness: a topical state-of-the-art review. In: Proceedings of the International Conference on Computer Applications and Network Security (ICCANS) (2011)Google Scholar
- 21.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res.
**12**, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar - 22.Pew Research Center: The evolving role of news on Twitter and Facebook (2015). http://www.journalism.org/2015/07/14/the-evolving-role-of-news-on-twitter-and-facebook
- 23.Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol.
**2**(1), 37–63 (2011)MathSciNetGoogle Scholar - 24.Reuters Institute for the Study of Journalism: Digital news report 2015: tracking the future of news (2015). http://www.digitalnewsreport.org/survey/2015/social-networks-and-their-role-in-news-2015/
- 25.Seo, E., Mohapatra, P., Abdelzaher, T.: Identifying rumors and their sources in social networks. In: SPIE Defense, Security, and Sensing, p. 83891I. International Society for Optics and Photonics (2012)Google Scholar
- 26.Smola, A.J., Scholkopf, B.: A tutorial on support vector regression. Stat. Comput.
**14**, 199–222 (2004)MathSciNetCrossRefGoogle Scholar - 27.The Guardian: How riot rumours spread on Twitter (2011). http://www.theguardian.com/uk/interactive/2011/dec/07/london-riots-twitter
- 28.Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Cabestany, J., Prieto, A., Sandoval, F. (eds.) IWANN 2005. LNCS, vol. 3512, pp. 758–770. Springer, Heidelberg (2005). doi: 10.1007/11494669_93 CrossRefGoogle Scholar
- 29.Vosoughi, S.: Automatic detection and verification of rumors on Twitter. Ph.D. thesis, MIT (2015)Google Scholar
- 30.Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst.
**12**(4), 5–33 (1996)CrossRefGoogle Scholar - 31.Zubiaga, A., Liakata, M., Procter, R., Bontcheva, K., Tolmie, P.: Towards detecting rumours in social media. arXiv preprint arXiv:1504.04712 (2015)