Advertisement

Integrating Background Knowledge into Nearest-Neighbor Text Classification

  • Sarah Zelikovitz
  • Haym Hirsh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2416)

Abstract

This paper describes two different approaches for incorporating background knowledge into nearest-neighbor text classification. Our first approach uses background text to assess the similarity between training and test documents rather than assessing their similarity directly. The second method redescribes examples using Latent Semantic Indexing on the background knowledge, assessing document similarities in this redescribed space. Our experimental results show that both approaches can improve the performance of nearest-neighbor text classification. These methods are especially useful when labeling text is a labor-intensive job and when there is a large amount of information available about a specific problem on the World Wide Web.

Keywords

Background Knowledge Test Document Latent Semantic Analysis Latent Semantic Indexing Label Training 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    William Cohen. A web-based information system that reasons with structured collections of text. Proceedings of Autonomous Agents, 1998.Google Scholar
  2. 2.
    W. Cohen and H. Hirsh. Joins that generalize: Text categorization using WHIRL. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 169–173, 1998.Google Scholar
  3. 3.
    S. Deerwester, S. Dumais, G. Furnas, and T. Landauer. Indexing by latent semantic analysis. Journal for the American Society for Information Science, 41(6):391–407, 1990.CrossRefGoogle Scholar
  4. 4.
    S. Zelikovitz. Using Background Knowledge to Improve Text Classification. PhD thesis, Rutgers University, 2002.Google Scholar
  5. 5.
    S. Zelikovitz and H. Hirsh. Improving short text classification using unlabeled background knowledge to assess document similarity. Proceedings of the Seventeenth International Conference on Machine Learning, pages 1183–1190, 2000.Google Scholar
  6. 6.
    S. Zelikovitz and H. Hirsh. Using LSI for text classification in the presence of background text. Proceedings of the Tenth Conference for Information and Knowledge Management, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Sarah Zelikovitz
    • 1
  • Haym Hirsh
    • 1
  1. 1.Computer Science DepartmentRutgers UniversityPiscataway

Personalised recommendations