Integrating Background Knowledge into Nearest-Neighbor Text Classification
This paper describes two different approaches for incorporating background knowledge into nearest-neighbor text classification. Our first approach uses background text to assess the similarity between training and test documents rather than assessing their similarity directly. The second method redescribes examples using Latent Semantic Indexing on the background knowledge, assessing document similarities in this redescribed space. Our experimental results show that both approaches can improve the performance of nearest-neighbor text classification. These methods are especially useful when labeling text is a labor-intensive job and when there is a large amount of information available about a specific problem on the World Wide Web.
KeywordsBackground Knowledge Test Document Latent Semantic Analysis Latent Semantic Indexing Label Training
Unable to display preview. Download preview PDF.
- 1.William Cohen. A web-based information system that reasons with structured collections of text. Proceedings of Autonomous Agents, 1998.Google Scholar
- 2.W. Cohen and H. Hirsh. Joins that generalize: Text categorization using WHIRL. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 169–173, 1998.Google Scholar
- 4.S. Zelikovitz. Using Background Knowledge to Improve Text Classification. PhD thesis, Rutgers University, 2002.Google Scholar
- 5.S. Zelikovitz and H. Hirsh. Improving short text classification using unlabeled background knowledge to assess document similarity. Proceedings of the Seventeenth International Conference on Machine Learning, pages 1183–1190, 2000.Google Scholar
- 6.S. Zelikovitz and H. Hirsh. Using LSI for text classification in the presence of background text. Proceedings of the Tenth Conference for Information and Knowledge Management, 2001.Google Scholar