Machine Learning

, Volume 79, Issue 1–2, pp 105–121 | Cite as

A co-classification approach to learning from multilingual corpora

  • Massih-Reza AminiEmail author
  • Cyril Goutte


We address the problem of learning text categorization from a corpus of multilingual documents. We propose a multiview learning, co-regularization approach, in which we consider each language as a separate source, and minimize a joint loss that combines monolingual classification losses in each language while ensuring consistency of the categorization across languages. We derive training algorithms for logistic regression and boosting, and show that the resulting categorizers outperform models trained independently on each language, and even, most of the times, models trained on the joint bilingual data. Experiments are carried out on a multilingual extension of the RCV2 corpus, which is available for benchmarking.


Text categorization Multilingual data Logistic regression Boosting 


  1. Adeva, J. J. G., Calvo, R. A., & de Ipiña, D. L. (2005). Multilingual approaches to text categorisation. UPGRADE: The European Journal for the Informatics Professional, VI(3), 43–51. Google Scholar
  2. Amini, M.-R., Usunier, N., & Goutte, C. (2009). Learning from multiple partially observed views—an application to multilingual text categorization. Advances in Neural Information Processing, 23. Google Scholar
  3. Bach, F. R., Lanckriet, G. R. G., & Jordan, M. I. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. In Proc. 21st international conference on machine learning. Google Scholar
  4. Bel, N., Koster, C. H., & Villegas, M. (2003). Cross-lingual text categorization. In Proceedings ECDL 2003 (pp. 126–139). Google Scholar
  5. Bertsekas, D. (1999). Nonlinear programming (2nd ed.). Belmont: Athena Scientific. zbMATHGoogle Scholar
  6. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). Google Scholar
  7. Brefeld, U., Gärtner, T., Scheffer, T., & Wrobel, S. (2006). Efficient co-regularised least squares regression. In Proc. 23rd international conference on machine learning (pp. 137–144). Google Scholar
  8. Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. In Proceedings of the third annual symposium on document analysis and information retrieval, Las Vegas, NV (pp. 161–175). Google Scholar
  9. Collins, M., Schapire, R. E., & Singer, Y. (2000). Logistic regression, AdaBoost and Bregman distances. In Proc. computational learning theory (pp. 158–169). Google Scholar
  10. Csiszár, I. (1995). Maxent, mathematics and information theory. In Proceedings of the fifteenth international workshop on maximum entropy and Bayesian methods (pp. 35–50). Google Scholar
  11. Diethe, T., Hardoon, D. R., & Shawe-Taylor, J. (2008). Multiview Fisher discriminant analysis. In Hardoon, D. R., Leen, G., Kaski, S., & Shawe-Taylor, J. (Eds.), NIPS workshop on learning from multiple sources. Google Scholar
  12. Farquhar, J. D. R., Hardoon, D. R., Meng, H., Shawe-Taylor, J., & Szedmak, S. (2005). Two view learning: SVM-2K, theory and practice. Advances in Neural Information Processing, 18. Google Scholar
  13. Joachims, T. (1999). Transductive inference for text classification using support vector machines. In International conference on machine learning (pp. 200–209). Google Scholar
  14. Kakade, S. M., & Foster, D. P. (2007). Multi-view regression via canonical correlation analysis. In Proc. computational learning theory (COLT). Google Scholar
  15. Lafferty, J. D., Della Pietrea, S., & Della Pietra, V. (1999). Statistical learning algorithms based on Bregman distances. In Proceedings of the Canadian workshop on information theory. Google Scholar
  16. Lehmann, E. L. (1975). Nonparametric statistical methods based on ranks. New York: McGraw-Hill. Google Scholar
  17. Oard, D. W., & Diekema, A. R. (1998). Cross-language information retrieval. Annual Review of Information Science and Technology, 33. Google Scholar
  18. Reuters (2000). Reuters Corpus, Vol. 2: Multilingual, 1996-08-20 to 1997-08-19. Google Scholar
  19. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). Okapi at TREC-3. In Proc. 3rd text retrieval conference (TREC). Google Scholar
  20. Rosenberg, D. S., & Bartlett, P. L. (2007). The Rademacher complexity of co-regularized kernel classes. In M. Meila & X. Shen (Eds.), Proceedings of the eleventh international conference on artificial intelligence and statistics. Google Scholar
  21. Sindhwani, V., Niyogi, P., & Belkin, M. (2005). A co-regularization approach to semi-supervised learning with multiple views. In Proc. of the workshop on learning with multiple views at the 22nd ICML. Google Scholar
  22. Topsoe, F. (1979). Information theoretical optimization techniques. Kybernetika, 15, 7–17. MathSciNetGoogle Scholar
  23. Ueffing, N., Simard, M., Larkin, S., & Johnson, J. H. (2007). NRC’s PORTAGE system for WMT 2007. In ACL-2007 second workshop on SMT (pp. 185–188). Google Scholar
  24. van Rijsbergen, C. (1979). Information retrieval. London: Butterworths. Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Interactive Language Technologies groupNational Research Council CanadaGatineauCanada

Personalised recommendations