Finding Similar RSS News Articles Using Correlation-Based Phrase Matching
- Cite this paper as:
- Pera M.S., Ng YK. (2007) Finding Similar RSS News Articles Using Correlation-Based Phrase Matching. In: Zhang Z., Siekmann J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science, vol 4798. Springer, Berlin, Heidelberg
Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant, but not exact matches. We propose a correlation-based phrase matching (CPM) model that can detect RSS news articles which contain not only phrases that are exactly the same but also semantically relevant, which dictate the degrees of similarity of any two articles. As the number of RSS news feeds continue to increase over the Internet, our CPM approach becomes more significant, since it minimizes the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. Experimental results show that our CPM model on matching bigrams and trigrams outperforms other phrase, including keyword, matching approaches.
Unable to display preview. Download preview PDF.