Improving Text Categorization Using Domain Knowledge

  • Jingbo Zhu
  • Wenliang Chen
Conference paper

DOI: 10.1007/11428817_10

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3513)
Cite this paper as:
Zhu J., Chen W. (2005) Improving Text Categorization Using Domain Knowledge. In: Montoyo A., Muńoz R., Métais E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg

Abstract

In this paper, we mainly study and propose an approach to improve document classification using domain knowledge. First we introduce a domain knowledge dictionary NEUKD, and propose two models which use domain knowledge as textual features for text categorization. The first one is BOTW model which uses domain associated terms and conventional words as textual features. The other one is BOF model which uses domain features as textual features. But due to limitation of size of domain knowledge dictionary, we study and use a machine learning technique to solve the problem, and propose a BOL model which could be considered as the extended version of BOF model. In the comparison experiments, we consider naïve Bayes system based on BOW model as baseline system. Comparison experimental results of naïve Bayes systems based on those four models (BOW, BOTW, BOF and BOL) show that domain knowledge is very useful for improving text categorization. BOTW model performs better than BOW model, and BOL and BOF models perform better than BOW model in small number of features cases. Through learning new features using machine learning technique, BOL model performs better than BOF model.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jingbo Zhu
    • 1
  • Wenliang Chen
    • 1
  1. 1.Natural Language Processing Lab, Institute of Computer Software and TheoryNortheastern UniversityShenyangP.R. China

Personalised recommendations