Machine Learning

, Volume 88, Issue 1–2, pp 157–208 | Cite as

Statistical topic models for multi-label document classification

  • Timothy N. Rubin
  • America Chambers
  • Padhraic Smyth
  • Mark Steyvers
Article

Abstract

Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.

Keywords

Topic models LDA Multi-label classification Document modeling Text classification Graphical models Probabilistic generative models Dependency-LDA 

References

  1. The EUR-Lex repository, June 2010. URL http://www.ke.tu-darmstadt.de/resources/eurlex/eurlex.html.
  2. Allwein, E. L., Schapire, R. E., & Singer, Y. (2001). Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141. MathSciNetMATHGoogle Scholar
  3. Blei, D., & McAuliffe, J. (2008). Supervised topic models. In J. C. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems 20 (pp. 121–128). Cambridge: MIT Press. Google Scholar
  4. Blei, D. M., & Lafferty, J. D. (2005). Correlated topic models. In Advances in neural information processing systems. Google Scholar
  5. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. MATHGoogle Scholar
  6. Blei, D. M., Griffiths, T. L., & Jordan, M. I. (2010). The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57, 7:1–7:30. MathSciNetCrossRefGoogle Scholar
  7. Cao, L., & Fei-fei, L. (2007). Spatially coherent latent topic model for concurrent object segmentation and classification. In Proceedings of IEEE International Conference in Computer Vision (ICCV). Google Scholar
  8. Crammer, K., & Singer, Y. (2003). A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 3, 1025–1058. MathSciNetMATHGoogle Scholar
  9. Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curves. In ICML’06: proceedings of the 23rd international conference on machine learning (pp. 233–240). New York: ACM. CrossRefGoogle Scholar
  10. de Carvalho, A. C. P. L. F., & Freitas, A. A. (2009). A tutorial on multi-label classification techniques. In foundations of computational intelligence: Vol5. Studies in computational intelligence 205 (pp. 177–195). Berlin: Springer. Google Scholar
  11. Dekel, O., & Shamir, O. (2010). Multiclass-multilabel classification with more classes than examples. Journal of Machine Learning Research—Proceedings Track, 9, 137–144. Google Scholar
  12. Druck, G., Pal, C., McCallum, A., & Zhu, X. (2007). Semi-supervised classification with hybrid generative/discriminative methods. In KDD’07: proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 280–289). New York: ACM. CrossRefGoogle Scholar
  13. Eyheramendy, S., Genkin, A., Ju, W.-H., Lewis, D. D., & Madigan, D. (2003). Sparse Bayesian classifiers for text categorization (Technical report). Journal of Intelligence Community Research and Development. Google Scholar
  14. Fan, R.-E., & Lin, C.-J. (2007). A study on threshold selection for multi-label classification (Technical report). National Taiwan University. Google Scholar
  15. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874. MATHGoogle Scholar
  16. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305. MATHGoogle Scholar
  17. Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153. CrossRefGoogle Scholar
  18. Ghamrawi, N., & McCallum, A. (2005). Collective multi-label classification. In CIKM’05: proceedings of the 14th ACM international conference on information and knowledge management (pp. 195–200). New York: ACM. CrossRefGoogle Scholar
  19. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228–5235. CrossRefGoogle Scholar
  20. Har-Peled, S., Roth, D., & Zimak, D. (2002). Constraint classification: A new approach to multiclass classification and ranking (Technical report). Champaign, IL, USA. Google Scholar
  21. Hersh, W., Buckley, C., Leone, T. J., & Hickam, D. (1994). OHSUMED: an interactive retrieval evaluation and new large test collection for research. In SIGIR’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 192–201). New York: Springer. Google Scholar
  22. Ioannou, M., Sakkas, G., Tsoumakas, G., & Vlahavas, I. (2010). Obtaining bipartitions from score vectors for multi-label classification. In Proceedings of the 2010 22nd IEEE international conference on tools with artificial intelligence—Volume 01, ICTAI’10 (pp. 409–416). Washington: IEEE Comput. Soc. ISBN 978-0-7695-4263-8. doi:http://dx.doi.org/10.1109/ICTAI.2010.65. URL http://dx.doi.org/10.1109/ICTAI.2010.65. CrossRefGoogle Scholar
  23. Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–449. MATHGoogle Scholar
  24. Ji, S., Tang, L., Yu, S., & Ye, J. (2008). Extracting shared subspace for multi-label classification. In KDD’08: proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 381–389). New York: ACM. CrossRefGoogle Scholar
  25. Lacoste-Julien, S., Sha, F., & Jordan, M. I. (2008). DiscLDA: discriminative learning for dimensionality reduction and classification. In NIPS (pp. 897–904). Google Scholar
  26. Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397. Google Scholar
  27. Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., & Ma, W.-Y. (2005). Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations Newsletter, 7(1), 36–43. CrossRefGoogle Scholar
  28. Loza Mencía, E., & Fürnkranz, J. (2008a). Efficient pairwise multilabel classification for large-scale problems in the legal domain. In ECML PKDD’08: proceedings of the European conference on machine learning and knowledge discovery in databases—Part II (pp. 50–65). Berlin: Springer. CrossRefGoogle Scholar
  29. Loza Mencía, E., & Fürnkranz, J. (2008b). Efficient multilabel classification algorithms for large-scale problems in the legal domain. In Proceedings of the LREC 2008 workshop on semantic processing of legal texts. Google Scholar
  30. McCallum, A. K. (1999). Multi-label text classification with a mixture model trained by EM. In AAAI 99 workshop on text learning. Google Scholar
  31. Mimno, D., & McCallum, A. (2008). Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. In Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI’08). Google Scholar
  32. Mimno, D., Li, W., & McCallum, A. (2007). Mixtures of hierarchical topics with pachinko allocation. In ICML’07: proceedings of the 24th international conference on machine learning (pp. 633–640). New York: ACM. CrossRefGoogle Scholar
  33. Rak, R., Kurgan, L., & Reformat, M. (2005). Multi-label associative classification of medical documents from medline. In ICMLA’05: proceedings of the fourth international conference on machine learning and applications, Washington, DC, USA (pp. 177–186). CrossRefGoogle Scholar
  34. Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, August 2009 (pp. 248–256). Association for Computational Linguistics. Google Scholar
  35. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. In ECML/PKDD (2) (pp. 254–269). Google Scholar
  36. Rifkin, R. & Klautau, A. (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5, 1532–4435. Google Scholar
  37. Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In AUAI’04: proceedings of the 20th conference on uncertainty in artificial intelligence (pp. 487–494). Arlington: AUAI Press. Google Scholar
  38. Sandhaus, E. (2008). The New York Times Annotated Corpus. Linguistic Data Consortium. Philadelphia. Google Scholar
  39. Schneider, K.-M. (2004). On word frequency information and negative evidence in naive Bayes text classification. In España for natural language processing, EsTAL. Google Scholar
  40. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. CrossRefGoogle Scholar
  41. Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2004). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566–1581. MathSciNetCrossRefGoogle Scholar
  42. Tsoumakas, G., & Katakis, I. (2007). Multi label classification: An overview. International Journal of Data Warehouse and Mining, 3(3), 1–13. CrossRefGoogle Scholar
  43. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2009). Data mining and knowledge discovery handbook. Mining multi-label data. Berlin: Springer. Google Scholar
  44. Ueda, N., & Saito, K. (2002). Parametric mixture models for multi-labeled text. In NIPS (pp. 721–728). Google Scholar
  45. Wang, Y., Sabzmeydani, P., & Mori, G. (2007). Semi-latent Dirichlet allocation: a hierarchical model for human action recognition. In Proceedings of the 2nd conference on human motion: understanding, modeling, capture and animation (pp. 240–254). Berlin: Springer. CrossRefGoogle Scholar
  46. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2), 69–90. CrossRefGoogle Scholar
  47. Yang, Y. (2001). A study of thresholding strategies for text categorization. In SIGIR’01: proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 137–145). New York: ACM. CrossRefGoogle Scholar
  48. Yang, Y., Zhang, J., & Kisiel, B. (2003). A scalability analysis of classifiers in text categorization. In SIGIR’03: proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 96–103). New York: ACM. Google Scholar
  49. Zhang, M.-L., & Zhang, K. (2010). Multi-label learning by exploiting label dependency. In KDD’10: proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 999–1008). New York: ACM. CrossRefGoogle Scholar
  50. Zhang, M.-L., Peña, J. M., & Robles, V. (2009). Feature selection for multi-label naive Bayes classification. Information Science, 179(19), 3218–3229. MATHCrossRefGoogle Scholar
  51. Zhu, J., Ahmed, A., & Xing, E. P. (2009). MedLDA: maximum margin supervised topic models for regression and classification. In Proceedings of the 26th annual international conference on machine learning, ICML’09 (pp. 1257–1264). New York: ACM. Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Timothy N. Rubin
    • 1
  • America Chambers
    • 2
  • Padhraic Smyth
    • 2
  • Mark Steyvers
    • 1
  1. 1.Department of Cognitive SciencesUniversity of California, IrvineIrvineUSA
  2. 2.Department of Computer ScienceUniversity of California, IrvineIrvineUSA

Personalised recommendations