, Volume 21, Issue 2, pp 365–388 | Cite as

Efficient online extraction of keywords for localized events in twitter

  • Hamed Abdelhaq
  • Michael GertzEmail author
  • Ayser Armiti


Messages published via social media sites, such as Twitter, Facebook, and Foursquare hide a considerable amount of information about real world events. The timely identification of such events from this huge, unstructured, and noisy user-generated content plays an important role in increasing situation awareness and in supporting useful applications such as recommendation systems. Interestingly, a large number of these messages are enriched with location information, due to the recent advancements of today’s location acquisition techniques. This, in turn, enables location-aware event mining, i.e., the detection and tracking of localized events such as sport events, demonstrations, or traffic jams, to name but a few. The main building blocks of a localized event are local keywords that exhibit a surge in usage at the event location. In this paper, we propose an approach that aims at extracting local keywords from a stream of Twitter messages by (1) identifying local keywords, and (2) estimating the central location of each keyword. This extraction procedure is performed in an online fashion using a sliding window over the Twitter stream. Additionally, we address the problem of spatial outliers that adversely affect a sound identification of local keywords. Spatial outliers occur when people far away from the location of an event use related keywords in their Tweets. We handle this problem by adjusting the spatial distribution of keywords based on their co-occurrence with place names that may refer to the location of an event. To ensure scalability, we utilize a hierarchical spatial index to gradually prune the geographic space and thus to efficiently perform complex spatial computations. Extensive comparative experiments are conducted using Twitter data. The analysis of the experimental results demonstrates the superiority of our approach over existing methods in terms of efficiency and precision of the obtained results.


Local keywords Localized event Event detection Social media 


  1. 1.
    Abdelhaq H, Gertz M (2014) On the locality of keywords in Twitter streams. In: IWGS ’14, pp 12–20Google Scholar
  2. 2.
    Abdelhaq H, Gertz M, Sengstock C (2013) Spatio-temporal characteristics of bursty words in Twitter streams. In: SIGSPATIAL ’13, pp 149–158Google Scholar
  3. 3.
    Abdelhaq H, Sengstock C, Gertz M (2013) EvenTweet: online localized event detection from Twitter. Proc VLDB Endow 6(12):1326–1329CrossRefGoogle Scholar
  4. 4.
    Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SDM ’12, pp 624–635Google Scholar
  5. 5.
    Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, NorwellGoogle Scholar
  6. 6.
    Alvanaki F, Michel S, Ramamritham K, Weikum G (2012) See what’s enBlogue: real-time emergent topic identification in social media. In: EDBT ’12, pp 336–347Google Scholar
  7. 7.
    Backstrom L, Kleinberg J, Kumar R, Novak J (2008) Spatial variation in search engine queries. In: WWW ’08, pp 357–366Google Scholar
  8. 8.
    Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: ICWSM ’11Google Scholar
  9. 9.
    Boettcher A, Lee D (2012) EventRadar: a real-time local event detection scheme using Twitter stream. In: GreenCom ’12, pp 358–367Google Scholar
  10. 10.
    Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: MDMKDD ’10, pp 4:1–4:10Google Scholar
  11. 11.
    Chen L, Roy A (2009) Event detection from Flickr data through wavelet-based spatial analysis. In: CIKM ’09, pp 523–532Google Scholar
  12. 12.
    Chunara R, Andrews JR, Brownstein JS (2012) Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. Am J Trop Med Hyg 86(1):39–45CrossRefGoogle Scholar
  13. 13.
    Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 4:373–397Google Scholar
  14. 14.
    Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: KDD ’09, pp 477–486Google Scholar
  15. 15.
    Lappas T, Vieira MR, Gunopulos D, Tsotras VJ (2012) On the spatiotemporal burstiness of terms. PVLDB 5(9):836–847Google Scholar
  16. 16.
    Lee CH, Wu CH, Chien TF (2011) BursT: a dynamic term weighting scheme for mining microblogging messages. In: ISNN ’11, pp 548–557Google Scholar
  17. 17.
    Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: CIKM ’12, pp 155–164Google Scholar
  18. 18.
    Magdy A, Mokbel MF, Elnikety S, Nath S, He Y (2014) Mercury: A memory-constrained spatio-temporal real-time search on microblogs. In: ICDE ’14, pp 172–183Google Scholar
  19. 19.
    Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? comparing data from Twitter’s streaming API with Twitter’s firehose. In: ICWSM ’13Google Scholar
  20. 20.
    Petrovic S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: HLT ’10, pp 181–189Google Scholar
  21. 21.
    Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: SIGIR ’07, pp 103–110Google Scholar
  22. 22.
    Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: WWW ’10, pp 851–860Google Scholar
  23. 23.
    Samet H (1990) Applications of spatial data structures: computer graphics, image processing and GIS. Addison-WesleyGoogle Scholar
  24. 24.
    Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: news in tweets. In: GIS ’09, pp 42–51Google Scholar
  25. 25.
    Skovsgaard A, Sidlauskas D, Jensen C (2014) Scalable top-k spatio-temporal term querying. In: ICDE ’14, pp 148–159Google Scholar
  26. 26.
    Tanimoto S, Pavlidis T (1975) A hierarchical data structure for picture processing. Comput Vision Graph 4(2):104–119Google Scholar
  27. 27.
    Valkanas G, Gunopulos D (2013) How the live web feels about events. In: CIKM ’13, pp 639–648Google Scholar
  28. 28.
    Vlachos M, Meek C, Vagena Z, Gunopulos D (2004) Identifying similarities, periodicities and bursts for online search queries. In: SIGMOD ’04, pp 131–142Google Scholar
  29. 29.
    Watanabe K, Ochi M, Okabe M, Onai R (2011) Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: CIKM ’11, pp 2541–2544Google Scholar
  30. 30.
    Weng J, Lee BS (2011) Event detection in Twitter. In: ICWSM ’11Google Scholar
  31. 31.
    Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: SIGIR ’98, pp 28–36Google Scholar
  32. 32.
    Zhou X, Chen L (2014) Event detection over Twitter social media streams. VLDB J 23(3):381–400CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.moovel Group GmbHStuttgartGermany
  2. 2.Heidelberg UniversityHeidelbergGermany

Personalised recommendations