The Citizen IS the Journalist: Automatically Extracting News from the Swarm

  • João Marcos de Oliveira
  • Peter A. Gloor
Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)


User generated content has become a major trend in today’s journalistic ecosystem, where in many cases news arrive on social media platforms even before they reach mainstream media. Due to today’s hyperconnected society, this type of event is becoming more frequent, and “news-like” information is being produced all over the Internet on blogs, posted on Facebook or Twitter, Wikipedia, or any other platform that allows users to share their ideas and experiences. In this chapter, we describe SwarmPulse, a system that extracts news by combing through Wikipedia and Twitter to extract newsworthy items. We measured the accuracy of SwarmPulse comparing it against the Reuters and CNN RSS feeds and the Google News feed. We found precision of 83 % and recall of 15 % against these sources.


News Item Twitter User News Channel Breaking News Match Strength 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bayer T, Ford H, Tar D, Romanesco (2011) Quantifying quality collaboration patterns, systemic bias, POV pushing, the impact of news events, and editors’ reputation.
  2. Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on twitter. In: Proceedings of fifth international AAAI conference on weblogs and social media, AAAIGoogle Scholar
  3. Ciglan M, Nørvåg K (2010) WikiPop: personalized event detection system based on wikipedia page view statistics. In: Proceedings of the 19th ACM international conference on information and knowledge management, ACM, New York. doi: 10.1145/1871437.1871769
  4. Fuehres H, Gloor PA, Henninger M, Kleeb R, Nemoto K (2012) Galaxysearch—discovering the knowledge of many by using wikipedia as a Meta-Searchindex. Paper presented at collective intelligence conference, 2012 (arXiv:1204.2991).
  5. Futterer T, Gloor PA, Malhotra T, Mfula H, Packmohr K, Schultheiss S (2013) WikiPulse—a news-portal based on wikipedia. Paper presented at COINs13 conference, Chile, 2013 (arxiv:1308.1028)Google Scholar
  6. Iba T, Nemoto K, Peters B, Gloor PA (2010) Analyzing the creative editing behavior of wikipedia editors: through dynamic social network analysis. Procedia Soc Behav Sci 2:6441–6456. doi:
  7. Osborne M, Petrovic S, McCreadie R, Macdonald C, Ounis I (2012) Bieber no more: first story detection using Twitter and Wikipedia. In: Proceedings of the SIGIR workshop in time-aware information access. Association for Computing MachineryGoogle Scholar
  8. Petrovic S, Osborne M, McCreadie R, Macdonald C, Ounis I, Shrimpton L (2013) Can Twitter replace newswire for breaking news? In: Proceedings of the international AAAI conference on web and social mediaGoogle Scholar
  9. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide webGoogle Scholar
  10. Subašić I, Berendt B (2011) Peddling or creating? Investigating the role of twitter in news reporting. In: Clough P, Foley C, Gurrin C, Jones G, Kraaij W, Lee H, Mudoch V (eds) Advances in information retrieval. Springer, BerlinGoogle Scholar
  11. Wood C (2013) Wikirage.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Federal University of Juiz de ForaJuiz de ForaBrazil
  2. 2.MIT Center for Collective IntelligenceCambridgeUSA

Personalised recommendations