Real-Time Story Detection and Video Retrieval from Social Media Streams



This chapter introduces two key tools for journalists. Before being able to initiate the process of verification of an online video, they need to be able to determine the news story that is the subject of online video, and they need to be able to find candidate online videos around that story. To do this, we have assessed prior research in the area of topic detection and developed a keyword graph-based method for news story discovery out of Twitter streams. Then we have developed a technique for selection of online videos which are candidates for news stories by using the detected stories to form a query against social networks. This enables relevant information retrieval at Web scale for news story-associated videos. We present these techniques and results of their evaluations by observation of the detected stories and of the news videos which are presented for those stories, demonstrating state-of-the-art precision and recall for journalists to quickly identify videos for verification and re-use.



The work described in this chapter would not have been possible without the efforts and ideas of many other colleagues over the years. In particular, we acknowledge Walter Rafelsberger who initiated the story clustering implementation; Svitlana Vakulenko who first experimented with story detection on Twitter streams; Shu Zhu who contributed to the cluster merging, splitting, and burst detection; as well as Roland Pajuste who cleaned up the resulting code and worked on quality improvements and optimizations to make it more efficient.


  1. 1.
    Papadopoulos S, Corney D, Aiello LM (2014) Snow 2014 data challenge: assessing the performance of news topic detection methods in social media. In: SNOW-DC@ WWW, pp 1–8Google Scholar
  2. 2.
    Pouliquen B, Steinberger R, Deguernel O (2008) Story tracking: linking similar news over time and across languages. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 49–56Google Scholar
  3. 3.
    Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2, p 4Google Scholar
  4. 4.
    Leban G, Fortuna B, Brank J, Grobelnik M (2014) Cross-lingual detection of world events from news articles. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, Italy, 21 October 2014, pp 21–24.
  5. 5.
    Rupnik J, Muhic A, Leban G, Skraba P, Fortuna B, Grobelnik M (2015) News across languages-cross-lingual document similarity and event tracking. arXiv:1512.07046
  6. 6.
    Hu M, Liu S, Wei F, Wu Y, Stasko J, Ma KL (2012) Breaking news on Twitter. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2751–2754Google Scholar
  7. 7.
    Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, MDMKDD ’10. ACM, New York, NY, USA, pp 4:1–4:10.
  8. 8.
    Aiello L, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris I, Jaimes A (2013) Sensing trending topics in Twitter. IEEE Trans Multim 15(6):1268–1282. Scholar
  9. 9.
    Wold HM, Vikre LC (2015) Online news detection on TwitterGoogle Scholar
  10. 10.
    Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JAsIs 41(6):391–407CrossRefGoogle Scholar
  11. 11.
    Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284. Scholar
  12. 12.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022.
  13. 13.
    Petkos G, Papadopoulos S, Kompatsiaris Y (2014) Two-level message clustering for topic detection in Twitter. In: SNOW-DC@ WWW, pp 49–56Google Scholar
  14. 14.
    Martín-Dancausa C, Göker A (2014) Real-time topic detection with bursty n-grams: RGU’s submission to the 2014 SNOW challengeGoogle Scholar
  15. 15.
    Van Canneyt S, Feys M, Schockaert S, Demeester T, Develder C, Dhoedt B (2014) Detecting newsworthy topics in Twitter. In: Data challenge. Proceedings, Seoul, Korea, pp 1–8Google Scholar
  16. 16.
    Martín-Dancausa C, Corney D, Göker A (2015) Mining newsworthy topics from social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in computational intelligence, vol 602. Springer International Publishing, pp 21–43. Scholar
  17. 17.
    Ifrim G, Shi B, Brigadir I (2014) Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: SNOW-DC@ WWW, pp 33–40Google Scholar
  18. 18.
    Elbagoury A, Ibrahim R, Farahat A, Kamel M, Karray F (2015) Exemplar-based topic detection in Twitter streams. In: Ninth international AAAI conference on web and social media.
  19. 19.
    Popescu AM, Pennacchiotti M, Paranjpe D (2011) Extracting events and event descriptions from Twitter. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 105–106Google Scholar
  20. 20.
    Ritter A, Etzioni O, Clark S et al (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1104–1112Google Scholar
  21. 21.
    Katsios G, Vakulenko S, Krithara A, Paliouras G (2015) Towards open domain event extraction from twitter: revealing entity relations. In: Proceedings of the 4th DeRiVE workshop co-located with the 12th extended semantic web conference (ESWC 2015), Protoroz, Slovenia, May 2015, pp 35–46Google Scholar
  22. 22.
    Lendvai P, Declerck T (2015) Similarity-based cross-media retrieval for events. In: Bergmann R, Görg S, Müller G (eds) Proceedings of the LWA 2015 workshops: KDML, FGWM, IR, and FGDB. CEURSGoogle Scholar
  23. 23.
    Petrovic S, Osborne M, Lavrenko V (2012) Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 338–346Google Scholar
  24. 24.
    Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in Twitter. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 3, pp 120–123.
  25. 25.
    Stokes N, Carthy J (2001) Combining semantic and syntactic document classifiers to improve first story detection. In: SIGIR 2001: Proceedings of the 24th ACM SIGIR conference, New Orleans, Louisiana, USA, 9–13 September 2001, pp 424–425.
  26. 26.
    Osborne M, Petrovic S, McCreadie R, Macdonald C, Ounis I (2012) Bieber no more: first story detection using Twitter and Wikipedia. In: Proceedings of the workshop on time-aware information access. TAIA, vol 12Google Scholar
  27. 27.
    Burnside G, Milioris D, Jacquet P (2014) One day in Twitter: topic detection via joint complexity.
  28. 28.
    Fujiki T, Nanno T, Suzuki Y, Okumura M (2004) Identification of bursts in a document stream. In: First international workshop on knowledge discovery in data streams (in conjunction with ECML/PKDD 2004). Citeseer, pp 55–64Google Scholar
  29. 29.
    Steiner T, van Hooland S, Summers E (2013) MJ no more: using concurrent Wikipedia edit spikes with social network plausibility checks for breaking news detection. In: Proceedings of the 22nd international conference on world wide web, WWW ’13 Companion, Geneva, Switzerland, pp 791–794.
  30. 30.
    Yılmaz Y, Hero AO (2018) Multimodal event detection in Twitter hashtag networks. J Signal Process Syst 90(2):185–200CrossRefGoogle Scholar
  31. 31.
    Hammad M, El-Beltagy SR (2017) Towards efficient online topic detection through automated bursty feature detection from Arabic Twitter streams. Procedia Comput Sci 117:248–255CrossRefGoogle Scholar
  32. 32.
    Srijith P, Hepple M, Bontcheva K, Preotiuc-Pietro D (2017) Sub-story detection in twitter with hierarchical Dirichlet processes. Inf Process Manag 53(4):989–1003CrossRefGoogle Scholar
  33. 33.
    Alsaedi N, Burnap P, Rana O (2017) Can we predict a riot? Disruptive event detection using Twitter. ACM Trans Internet Technol (TOIT) 17(2):18CrossRefGoogle Scholar
  34. 34.
    Qin Y, Zhang Y, Zhang M, Zheng D (2018) Frame-based representation for event detection on Twitter. IEICE Trans Inf Syst 101(4):1180–1188CrossRefGoogle Scholar
  35. 35.
    Mele I, Crestani F (2017) Event detection for heterogeneous news streams. In: International conference on applications of natural language to information systems. Springer, pp 110–123Google Scholar
  36. 36.
    Tonon A, Cudré-Mauroux P, Blarer A, Lenders V, Motik B (2017) Armatweet: detecting events by semantic tweet analysis. In: European semantic web conference. Springer, pp 138–153Google Scholar
  37. 37.
    Katragadda S, Benton R, Raghavan V (2017) Framework for real-time event detection using multiple social media sourcesGoogle Scholar
  38. 38.
    Vakulenko S, Nixon L, Lupu M (2017) Character-based neural embeddings for tweet clustering. In: Proceedings of the fifth international workshop on natural language processing for social media. Association for Computational Linguistics, Valencia, Spain, pp 36–44.
  39. 39.
    Baeza-Yates RA (1989) Improved string searching. Softw Pract Exp. 19(3):257–271. Scholar
  40. 40.
    Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26CrossRefGoogle Scholar
  41. 41.
    Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  42. 42.
    Arcan M, McCrae JP, Buitelaar P (2016) Expanding wordnets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 97–108Google Scholar
  43. 43.
    Ji H, Pan X, Zhang B, Nothman J, Mayfield J, McNamee P, Costello C (2017) Overview of tac-kbp2017 13 languages entity discovery and linking. In: TACGoogle Scholar
  44. 44.
    Weichselbraun A, Kuntschik P, Braşoveanu AM (2018) Mining and leveraging background knowledge for improving named entity linking. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, WIMS ’18. ACM, New York, NY, USA, pp 27:1–27:11.
  45. 45.
    Weichselbraun A, Kuntschik P, Brasoveanu AMP (2019) Name variants for improving entity discovery and linking. In: Language, data and knowledge (LDK)Google Scholar
  46. 46.
    Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008. Scholar
  47. 47.
    Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. Scholar
  48. 48.
    Zimmermann A (2014) On the cutting edge of event detection from social streams a non-exhaustive surveyGoogle Scholar
  49. 49.
    Nixon LJ, Zhu S, Fischer F, Rafelsberger W, Göbel M, Scharl A (2017) Video retrieval for multimedia verification of breaking news on social networks. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 13–21.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.MODUL Technology GmbHViennaAustria
  2. 2.webLyzard technology gmbhViennaAustria

Personalised recommendations