A Term Association Translation Model for Naive Bayes Text Classification

  • Meng-Sung Wu
  • Hsin-Min Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7301)


Text classification (TC) has long been an important research topic in information retrieval (IR) related areas. In the literature, the bag-of-words (BoW) model has been widely used to represent a document in text classification and many other applications. However, BoW, which ignores the relationships between terms, offers a rather poor document representation. Some previous research has shown that incorporating language models into the naive Bayes classifier (NBC) can improve the performance of text classification. Although the widely used N-gram language models (LM) can exploit the relationships between words to some extent, they cannot model the long-distance dependencies of words. In this paper, we study the term association modeling approach within the translation LM framework for TC. The new model is called the term association translation model (TATM). The innovation is to incorporate term associations into the document model. We employ the term translation model to model such associative terms in the documents. The term association translation model can be learned based on either the joint probability (JP) of the associative terms through the Bayes rule or the mutual information (MI) of the associative terms. The results of TC experiments evaluated on the Reuters-21578 and 20newsgroups corpora demonstrate that the new model implemented in both ways outperforms the standard NBC method and the NBC with a unigram LM.


Term association mutual information Bayes translation language model text classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antonie, M.L., Zaiane, O.R.: Text Document Categorization by Term Association. In: Proceedings of IEEE 2002 International Conference on Data Mining (ICDM), pp. 19–26 (2002)Google Scholar
  2. 2.
    Bai, J., Nie, J.Y.: Using language models for text classification. In: Proceedings of the Asia Information Retrieval Symposium, AIRS (2004)Google Scholar
  3. 3.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)zbMATHGoogle Scholar
  4. 4.
    Cao, G., Nie, J.Y., Bai, J.: Integrating word relationships into language models. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 298–305 (2005)Google Scholar
  5. 5.
    Chien, J.T., Wu, M.S., Peng, H.J.: Latent semantic language modeling and smoothing. International Journal of Computational Linguistics and Chinese Language Processing 9(2), 29–44 (2004)Google Scholar
  6. 6.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)Google Scholar
  7. 7.
    Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Transferring Naive Bayes Classifiers for Text Classification. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 540–545 (2007)Google Scholar
  8. 8.
    Debole, F., Sebastiani, F.: An analysis of the relative difficulty of Reuters-21578 subsets. Journal of the American Society for Information Science and Technology 56(2), 584–596 (2005)CrossRefGoogle Scholar
  9. 9.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  10. 10.
    Joachims, T.: Text Categorization With Support Vector Machines: Learning With Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Karimzadehgan, M., Zhai, C.: Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 323–330 (2010)Google Scholar
  12. 12.
    McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)Google Scholar
  13. 13.
    Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39(2/3), 103–134 (2000)zbMATHCrossRefGoogle Scholar
  14. 14.
    Peng, F., Schuurmans, D.: Combining Naive Bayes and n-Gram Language Models for Text Classification. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 335–350. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  15. 15.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)Google Scholar
  16. 16.
    Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39(2/3), 135–168 (2000)zbMATHCrossRefGoogle Scholar
  17. 17.
    Schneider, K.-M.: Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 252–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Wei, X., Croft, W.B.: Modeling Term Associations for Ad-Hoc Retrieval Performance Within Language Modeling Framework. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 52–63. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Wu, M.S., Lee, H.S., Wang, H.M.: Exploiting semantic associative information in topic modeling. In: Proceedings of the IEEE Workshop on Spoken Language Technology, pp. 384–388 (2010)Google Scholar
  20. 20.
    Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1-2), 67–88 (1999)Google Scholar
  21. 21.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to Ad Hoc information retrieval. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 334–342 (2001)Google Scholar
  22. 22.
    Zhou, G., Lua, K.: Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition. Computer Speech and Language 13(2), 125–141 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Meng-Sung Wu
    • 1
  • Hsin-Min Wang
    • 1
  1. 1.Institute of Information Science, Academia SinicaTaipeiTaiwan

Personalised recommendations