Event detection is a concept that is crucial to the assurance of public safety surrounding real-world events. Decision makers use information from a range of terrestrial and online sources to help inform decisions that enable them to develop policies and react appropriately to events as they unfold. One such source of online information is social media. Twitter, as a form of social media, is a popular micro-blogging web application serving hundreds of millions of users. User-generated content can be utilized as a rich source of information to identify real-world events. In this paper, we present a novel detection framework for identifying such events, with a focus on ‘disruptive’ events using Twitter data.The approach is based on five steps; data collection, pre-processing, classification, clustering and summarization. We use a Naïve Bayes classification model and an Online Clustering method to validate our model over multiple real-world data sets. To the best of our knowledge, this study is the first effort to identify real-world events in Arabic from social media.


Text mining Information Extraction Classification Online-Clustering Machine Learning Event detection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alsaedi, N., Burnap, P., Rana, O.: A Combined Classification-Clustering Framework for Identifying Disruptive Events. In: Proceedings of 7th ASE International Conference on Social Computing (SocialCom 2014), pp. 1–10 (2014),
  2. 2.
    Darwish, K., Magdy, W.: Arabic Information Retrieval. Foundations and Trends® in Information Retrieval 7, 239–342 (2014), CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Larkey, L., Ballesteros, L., Connell, M.: Light stemming for Arabic information retrieval. Arabic Computational Morphology, 221–243 (2007)Google Scholar
  5. 5.
    Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceeding CIKM 2010, pp. 759–768 (2010),
  6. 6.
    Hecht, B., Hong, L., Suh, B., Chi, E.: Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–246 (2011)Google Scholar
  7. 7.
    Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.: Measuring User Influence in Twitter: The Million Follower Fallacy. In: ICWSM 2010 (2010)Google Scholar
  8. 8.
    Ma, Z., Sun, A., Cong, G.: On predicting the popularity of newly emerging hashtags in twitter. Journal of the American Society for Information Science and Technology 64(7), 1399–1410 (2013)CrossRefGoogle Scholar
  9. 9.
    Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in Twitter events. Journal of the American Society for Information Science and Technology 62(2), 406–418 (2011)CrossRefGoogle Scholar
  10. 10.
    Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the ACL 2011 Workshop on Languages in Social Media, pp. 30–38 (2011)Google Scholar
  11. 11.
    Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 181–189 (2010)Google Scholar
  12. 12.
    Cordeiro, M.: Twitter event detection: combining wavelet analysis and topic inference summarization. In: Doctoral Symposium on Informatics Engineering, DSIE 2012 (2012)Google Scholar
  13. 13.
    Cheng, J., Adamic, L., Dow, P., Jon, K., Jure, L. (2014), Can cascades be predicted? In: WWW 2014 (2014),
  14. 14.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. In: 19th International World Wide Web Conference, WWW 2010 (2010)Google Scholar
  15. 15.
    Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2010, pp. 120–123 (2010)Google Scholar
  16. 16.
    Bollmann, P.: A comparison of evaluation measures for document retrieval systems. Journal of Informatics, 97–116 (1977)Google Scholar
  17. 17.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 38 (1998)Google Scholar
  18. 18.
    Takahashi, T., Igata, N.: Rumor detection on twitter. In: SCIS ‘6 and ISIS ‘13, pp. 452–457 (2012)Google Scholar
  19. 19.
    Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer (2014)Google Scholar
  20. 20.
    Dou, W., Wang, X., Skau, D., Ribarsky, W., Zhou, M.X.: LeadLine: Interactive visual analysis of text data through event identification. In: VAST 2012, pp. 93–102 (2012)Google Scholar
  21. 21.
    Becker, H., Naaman, M., Gravano, L.: Beyond Trending Topics: Real- Event Identification on Twitter. In: ICWSM, pp. 1–17 (2011)Google Scholar
  22. 22.
    Khoja, S., Garside, R., Knowles, G.: Stemming arabic text. In: NAACL 2001 (2001)Google Scholar
  23. 23.
    Chua, F., Asur, S.: Automatic Summarization of Events from Social Media. In: ICWSM 2013 (2012)Google Scholar
  24. 24.
    Mahmud, J., Nichols, J., Drews, C.: Where Is This Tweet From? Inferring Home Locations of Twitter Users. In: ICWSM, pp. 511–514 (2012),
  25. 25.
    Porter, M.: An algorithm for suffix stripping. Program: Electronic Library & Information Systems 40(3), 211 – 218Google Scholar
  26. 26.
    Burnap, P., Williams, M.L., Sloan, L., Rana, O., Housley, W., Edwards, A., Knight, V., Procter, R., Voss, A.: Tweeting the Terror: Modelling the Social Media Reaction to the Woolwich Terrorist Attack. Social Network Analysis and Mining 4, 1 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Cardiff School of Computer Science and InformaticsCardiff UniversityCardiffUK

Personalised recommendations