Machine Learning

, Volume 79, Issue 1–2, pp 123–149 | Cite as

Multi-domain learning by confidence-weighted parameter combination

Article

Abstract

State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.

Keywords

Online learning Domain adaptation Classifier combination Transfer learning Multi-task learning 

References

  1. Abernethy, J. D., Bartlett, P., & Rakhlin, A. (2007). Multitask learning with expert advice (Tech. Rep. UCB/EECS-2007-20). EECS Department, University of California, Berkeley. Google Scholar
  2. Ando, R., & Zhang, T. (2005). A framework for learning predictive structure from multiple tasks and unlabeled data. Journal of Machine Learning Research (JMLR), 6, 1817–1853. MathSciNetGoogle Scholar
  3. Arnold, A., Nallapati, R., & Cohen, W. W. (2008). Exploiting feature hierarchy for transfer learning in named entity recognition. In Association for computational linguistics (ACL). Google Scholar
  4. Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multi–task learning. Journal of Machine Learning Research (JMLR), 4, 83–99. CrossRefGoogle Scholar
  5. Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS). Google Scholar
  6. Bickel, S., Brückner, M., & Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In International conference on machine learning (ICML). Google Scholar
  7. Bickel, S., Sawade, C., & Scheffer, T. (2009). Transfer learning by distribution matching for targeted advertising. In Advances in neural information processing systems (pp. 145–152). Google Scholar
  8. Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical methods in natural language processing (EMNLP). Google Scholar
  9. Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2007a). Learning bounds for domain adaptation. In Advances in neural information processing systems (NIPS). Google Scholar
  10. Blitzer, J., Dredze, M., & Pereira, F. (2007b). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for computational linguistics (ACL). Google Scholar
  11. Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75. CrossRefGoogle Scholar
  12. Chan, Y. S., & Ng, H. T. (2006). Estimating class priors in domain adaptation for word sense disambiguation. In Association for computational linguistics (ACL). Google Scholar
  13. Chelba, C., & Acero, A. (2004). Adaptation of maximum entropy classifier: Little data can help a lot. In Empirical methods in natural language processing (EMNLP). Google Scholar
  14. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research (JMLR), 7, 551–585. MathSciNetGoogle Scholar
  15. Crammer, K., Dredze, M., & Pereira, F. (2008). Exact confidence-weighted learning. In Advances in neural information processing systems (NIPS). Google Scholar
  16. Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2007). Transferring naive Bayes classifiers for text classification. In American national conference on artificial intelligence (AAAI). Google Scholar
  17. Dai, W., Chen, Y., Xue, G. R., Yang, Q., & Yu, Y. (2009). Translated learning: Transfer learning across different feature spaces. In Advances in neural information processing systems (NIPS) (pp. 353–360). Google Scholar
  18. Daumé, H. (2007). Frustratingly easy domain adaptation. In Association for computational linguistics (ACL). Google Scholar
  19. Daumé, H. (2009). Bayesian multitask learning with latent hierarchies. In Uncertainty in artificial intelligence (UAI). Google Scholar
  20. Daumé, H., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research (JAIR), 26, 101–126. MATHGoogle Scholar
  21. Dekel, O., Long, P. M., & Singer, Y. (2006). Online multitask learning. In Conference on learning theory (COLT). Google Scholar
  22. Do, C. B., & Ng, A. (2006). Transfer learning for text classification. In Advances in neural information processing systems (NIPS). Google Scholar
  23. Dredze, M., & Crammer, K. (2008). Online methods for multi-domain learning and adaptation. In Empirical methods in natural language processing (EMNLP). Google Scholar
  24. Dredze, M., Blitzer, J., Talukdar, P. P., Ganchev, K., Graca, J., & Pereira, F. (2007). Frustratingly hard domain adaptation for parsing. In Shared task—conference on natural language learning—CoNLL 2007 shared task. Google Scholar
  25. Dredze, M., Crammer, K., & Pereira, F. (2008). Confidence-weighted linear classification. In International conference on machine learning (ICML). Google Scholar
  26. Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In Conference on knowledge discovery and data mining (KDD). Google Scholar
  27. Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In Conference on computational natural language learning (CONLL). Google Scholar
  28. Jiang, J., & Zhai, C. (2007a). Instance weighting for domain adaptation in nlp. In Association for computational linguistics (ACL). Google Scholar
  29. Jiang, J., & Zhai, C. (2007b). A two-stage approach to domain adaptation for statistical classifiers. In Conference on information and knowledge management (CIKM). Google Scholar
  30. Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. CrossRefGoogle Scholar
  31. Lease, M., & Charniak, E. (2005). Parsing biomedical literature. In International joint conference on natural language processing (IJCNLP). Google Scholar
  32. Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261. MATHCrossRefMathSciNetGoogle Scholar
  33. Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation with multiple sources. In Advances in neural information processing systems. Google Scholar
  34. Marcus, M., Marcinkiewicz, M., & Santorini, B. (1993). Building a large annotated corpus of English: the penn treebank. Computational Linguistics, 19(2), 313–330. Google Scholar
  35. Marx, Z., Rosenstein, M. T., Dietterich, T. G., & Kaelbling, L. P. (2008). Two algorithms for transfer learning. In Inductive transfer: 10 years later. Google Scholar
  36. McClosky, D., & Charniak, E. (2008). Self-training for biomedical parsing. In Association for computational linguistics (ACL). Google Scholar
  37. Obozinski, G., Taskar, B., & Jordan, M. (2006). Multi-task feature selection. In ICML-06 workshop on structural knowledge transfer for machine learning. Google Scholar
  38. Raina, R., Ng, A., & Koller, D. (2006). Constructing informative priors using transfer learning. In International conference on machine learning (ICML). Google Scholar
  39. Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. (2007). Self-taught learning: transfer learning from unlabeled data. In International conference on machine learning (ICML) (pp. 59–766). Google Scholar
  40. Satpal, S., & Sarawagi, S. (2007). Domain adaptation of conditional probability models via feature subsetting. In European conference on principles and practice of knowledge discovery in databases. Google Scholar
  41. Schweikert, G., Widmer, C., Schölkopf, B., & Rätsch, G. (2008). An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Advances in neural information processing systems (NIPS). Google Scholar
  42. Tax, D. M. J., van Breukelen, M., Duina, R. P. W., & Kittler, J. (2000). Combining multiple classifiers by averaging or by multiplying? Pattern Recognition, 33(9), 1475–1485. CrossRefGoogle Scholar
  43. Thrun, S., & O’Sullivan, J. (1998). Clustering learning tasks and the selective cross–task transfer of knowledge. In S. Thrun & L. Pratt (Eds.), Learning to learn. Amsterdam: Kluwer Academic. Google Scholar
  44. Woods, K., Kegelmeyer, W. P. Jr., & Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 405–410. doi:10.1109/34.588027. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Human Language Technology Center of ExcellenceJohns Hopkins UniversityBaltimoreUSA
  2. 2.Department of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphiaUSA
  3. 3.Department of Electrical EngineeringThe TechnionHaifaIsrael

Personalised recommendations