This paper presents a Bayes document classifier using phrases as features. The phrases are extracted using a grammar that iteratively applies the rules to the sequence of words in the document. This grammar is generated from a training set using statistical word association. We report an improvement in the classification over the “bag of words” representation.


Feature Vector Mutual Information Training Corpus Association Measure Association Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jan Bakus
    • 1
  • Mohamed Kamel
    • 1
  1. 1.Pattern Analysis and Machine Intelligence Lab Department of Systems Design EngineeringUniversity of WaterlooWaterlooCanada

