Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification
 TzuTsung Wong
The generalized Dirichlet distribution has been shown to be a more appropriate prior than the Dirichlet distribution for naïve Bayesian classifiers. When the dimension of a generalized Dirichlet random vector is large, the computational effort for calculating the expected value of a random variable can be high. In document classification, the number of distinct words that is the dimension of a prior for naïve Bayesian classifiers is generally more than ten thousand. Generalized Dirichlet priors can therefore be inapplicable for document classification from the viewpoint of computational efficiency. In this paper, some properties of the generalized Dirichlet distribution are established to accelerate the calculation of the expected values of random variables. Those properties are then used to construct noninformative generalized Dirichlet priors for naïve Bayesian classifiers with multinomial models. Our experimental results on two document sets show that generalized Dirichlet priors can achieve a significantly higher prediction accuracy and that the computational efficiency of naïve Bayesian classifiers is preserved.
 Title
 Journal

Data Mining and Knowledge Discovery
Volume 28, Issue 1 , pp 123144
 Cover Date
 20140101
 DOI
 10.1007/s1061801202964
 Print ISSN
 13845810
 Online ISSN
 1573756X
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Document classification
 Generalized Dirichlet distribution
 Multinomial model
 Naïve Bayesian classifier
 Industry Sectors
 Authors

 TzuTsung Wong ^{(1)}
 Author Affiliations

 1. Institute of Information Management, National Cheng Kung University, 1, TaSheuh Road, Tainan, 701, Taiwan, ROC