A Learned Approach for Ranking News in Real-Time Using the Blogosphere

  • Richard McCreadie
  • Craig Macdonald
  • Iadh Ounis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7024)


Newspaper websites and news aggregators rank news stories by their newsworthiness in real-time for display to the user. Recent work has shown that news stories can be ranked automatically in a retrospective manner based upon related discussion within the blogosphere. However, it is as yet undetermined whether blogs are sufficiently fresh to rank stories in real-time. In this paper, we propose a novel learning to rank framework which leverages current blog posts to rank news stories in a real-time manner. We evaluate our proposed learning framework within the context of the TREC Blog track top stories identification task. Our results show that, indeed, the blogosphere can be leveraged for the real-time ranking of news, including for unpredictable events. Our approach improves upon state-of-the-art story ranking approaches, outperforming both the best TREC 2009/2010 systems and its single best performing feature.


News Story Ranking Model Unpredictable Event News Corpus Article Content 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Newspaper Association of America (NAA): Newspaper Web sites attract more than 70 million visitors in June; over one-third of all Internet users visit newspaper Web sites (2010),, (accessed on January 25, 2010)
  2. 2.
    Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. 25(3), 14 (2007)CrossRefGoogle Scholar
  3. 3.
    Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010 (2010)Google Scholar
  4. 4.
    Lee, Y., Jung, H.y., Song, W., Lee, J.H.: Mining the blogosphere for top news stories identification. In: Proceeding of SIGIR 2010 (2010)Google Scholar
  5. 5.
    Leidner, J.L.: Thomson Reuters releases TRC2 news corpus through NIST (2010), (accessed on January 16, 2011)
  6. 6.
    Lin, Y.F., Wang, J.H., Lai, L.C., Kao, H.Y.: Top stories identification from blog to news in TREC 2010 Blog track. In: Proceedings of TREC 2010 (2010)Google Scholar
  7. 7.
    Lioma, C., Macdonald, C., Plachouras, V., Peng, J., He, B., Ounis, I.: University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier. In: Proceedings of TREC 2006 (2006)Google Scholar
  8. 8.
    Liu, T.Y.: Learning to rank for Information Retrieval. Foundations and Trends® in Information Retrieval 3(3), 225–331 (2009)CrossRefGoogle Scholar
  9. 9.
    Macdonald, C., Ounis, I.: The TREC Blogs06 collection: Creating and analysing a blog test collection. Tech report. Univ. of GlasgowGoogle Scholar
  10. 10.
    Macdonald, C.: The Voting Model for People Search. Ph.D. thesis, Univ. of Glasgow (2009)Google Scholar
  11. 11.
    Macdonald, C., Ounis, I.: Learning models for ranking aggregates. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 517–529. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Macdonald, C., Soboroff, I., Ounis, I.: Overview of TREC-2009 Blog track. In: Proceedings of TREC 2009. NIST (2009)Google Scholar
  13. 13.
    Matheson, D.: Weblogs and the epistemology of the news: Some trends in online journalism. New Media and Society 6(4), 443–468 (2004)CrossRefGoogle Scholar
  14. 14.
    McCreadie, R., Macdonald, C., Ounis, I.: News article ranking: Leveraging the wisdom of bloggers. In: Proceedings of RIAO 2010 (2010)Google Scholar
  15. 15.
    Mejova, Y., Ha Turc, V., Foster, S., Harris, C., Arens, B., Srinivasan, P.: TREC Blog and TREC Chem: A view from the corn fields. In: Proceedings of TREC 2009 (2009)Google Scholar
  16. 16.
    Metzler, D.A.: Automatic feature selection in the Markov random field model for Information Retrieval. In: Proceedings of CIKM 2007 (2007)Google Scholar
  17. 17.
    Mishne, G., de Rijke, M.: A study of blog search. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Santos, R.L.T., Macdonald, C., Ounis, I.: Voting for related entities. In: Proceedings of RIAO 2010 (2010)Google Scholar
  19. 19.
    Schmid, H.: Treetagger. TC project at the Institute for Computational Linguistics of the University of Stuttgart (1994)Google Scholar
  20. 20.
    Sussman, M.: The state of the Blogosphere 2009 (2009), (accessed on May 13, 2010)
  21. 21.
    Thelwall, M.: Bloggers during the London attacks: Top information sources and topics. In: Proceedings of WWW 2006 Blog Workshop (2006)Google Scholar
  22. 22.
    Xu, X., Liu, Y., Xu, H., Yu, X., Peng, Z., Cheng, X., Xiao, L., Nie, S.: ICTNET at Blog track TREC 2010. In: Proceedings of TREC 2010 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Richard McCreadie
    • 1
  • Craig Macdonald
    • 1
  • Iadh Ounis
    • 1
  1. 1.School of Computing ScienceUniversity of GlasgowUK

Personalised recommendations