Clustering-Based Searching and Navigation in an Online News Source

  • Simón C. Smith
  • M. Andrea Rodríguez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3936)


The growing amount of online news posted on the WWW demands new algorithms that support topic detection, search, and navigation of news documents. This work presents an algorithm for topic detection that considers the temporal evolution of news and the structure of web documents. Then, it uses the results of the topic detection algorithm for searching and navigating in an online news source. An experimental evaluation with a collection of online news in Spanish indicates the advantages of incorporating the temporal aspect and structure of documents in the topic detection of news. In addition, topic-based clusters are well suited for guiding the search and navigation of news.


False Alarm Information Retrieval Vector Model Online News Topic Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: DARPA Broadcast News Trasncription and Understanding Workshop, pp. 194–218 (September 1998),
  2. 2.
    Allan, J., feng, A., Bolivar, A.: Flexible intrinsic evaluation of hierarchical clustering for TDT. In: Twelfth International Conference on Information and Knowledge Management, pp. 263–270. ACM Press, New York (2003)Google Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar
  4. 4.
    EMOL. El mercurio online,
  5. 5.
    Ferragina, P., Gulli, A.: A perzonalized search engine based on web-snippet hierarchical clustering. In: International Conference in the World Wide Web WWW 2005, China, Japan, pp. 801–810. ACM Press, New York (2005)Google Scholar
  6. 6.
    Fukumoto, F., Suzuki, Y.: Event tracking based on domain dependency. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 24–28. ACM Press, New York (2000)Google Scholar
  7. 7.
    AbsInt Angewandte Informatik GmbH. GDL: aiSee graph visualization software: User manual unix version 2.2.07 (September 2005),
  8. 8.
    Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Topic detection and tracking with spatio-temporal evidence. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 251–265. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Mostafa, J.: Seeking better web searches. Scientific American Digital (2005),
  10. 10.
    Ram, S., Shankaranarayanan, G.: Modeling and navigation of large information spaces: A semantic based approach. In: International Conference on System Science, IEEE CS Press, Los Alamitos (1999), Google Scholar
  11. 11.
    Roussinov, D., McQuaid, M.: Information navigation by clustering and summary query results. In: International Conference on System Sciences, p. 3006. IEEE CS Press, Los Alamitos (2000)Google Scholar
  12. 12.
    Papka, R., Allan, J., Lavrenko, V.: Umass approaches to detection and tracking at tdt2. In: Proceedings of the DARPA Broadcast News (1999),
  13. 13.
    UMASS. Topic Detection and Tracking TDT (2005),
  14. 14.
    W3C. Scalable vector graphics (svg) 1.1 specification,
  15. 15.
    Walls, F., Jin, H., Sista, S., Schwatz, R.: Topic detection in broadcast news. In: Proceedings of the DARPA Broadcast News (1999),
  16. 16.
    Yang, Y., Carbonelli, J., Brown, R., Pierce, T., Archibald, B., Liu, X.: Learning approaches for detection and tracking news events. IEEE Intelligent Systems Special Isuue on Applications of Intelligent Information Retrieval 14, 32–43 (1999)Google Scholar
  17. 17.
    Zhang, Y., Ji, X., Chu, C.-H., Zha, H.: Correlating summarization of multi-source news with k-way graph bi-clustering. ACM SIGKDD Explorations Newsletter 6(2), 34–42 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Simón C. Smith
    • 1
  • M. Andrea Rodríguez
    • 1
  1. 1.Department of Computer ScienceUniversity of Concepción, Center for Web Research, University of ChileConcepciónChile

Personalised recommendations