Finding Similar RSS News Articles Using Correlation-Based Phrase Matching

  • Maria Soledad Pera
  • Yiu-Kai Ng
Conference paper

DOI: 10.1007/978-3-540-76719-0_34

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4798)
Cite this paper as:
Pera M.S., Ng YK. (2007) Finding Similar RSS News Articles Using Correlation-Based Phrase Matching. In: Zhang Z., Siekmann J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science, vol 4798. Springer, Berlin, Heidelberg

Abstract

Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant, but not exact matches. We propose a correlation-based phrase matching (CPM) model that can detect RSS news articles which contain not only phrases that are exactly the same but also semantically relevant, which dictate the degrees of similarity of any two articles. As the number of RSS news feeds continue to increase over the Internet, our CPM approach becomes more significant, since it minimizes the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. Experimental results show that our CPM model on matching bigrams and trigrams outperforms other phrase, including keyword, matching approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Maria Soledad Pera
    • 1
  • Yiu-Kai Ng
    • 1
  1. 1.Computer Science Dept., Brigham Young University, Provo, Utah 84602U.S.A.

Personalised recommendations