Domain Adaptation of Conditional Probability Models Via Feature Subsetting

Satpal, Sandeepkumar; Sarawagi, Sunita

doi:10.1007/978-3-540-74976-9_23

Sandeepkumar Satpal¹ &
Sunita Sarawagi¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4702))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3912 Accesses
50 Citations

Abstract

The goal in domain adaptation is to train a model using labeled data sampled from a domain different from the target domain on which the model will be deployed. We exploit unlabeled data from the target domain to train a model that maximizes likelihood over the training sample while minimizing the distance between the training and target distribution. Our focus is conditional probability models used for predicting a label structure y given input x based on features defined jointly over x and y. We propose practical measures of divergence between the two domains based on which we penalize features with large divergence, while improving the effectiveness of other less deviant correlated features. Empirical evaluation on several real-life information extraction tasks using Conditional Random Fields (CRFs) show that our method of domain adaptation leads to significant reduction in error.

Download to read the full chapter text

Chapter PDF

$$L_1$$ -Regularized Continuous Conditional Random Fields

A Two-Stage Conditional Random Field Model Based Framework for Multi-Label Classification

A comprehensive review of conditional random fields: variants, hybrids and applications

Article 13 December 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML-2001. Proceedings of the International Conference on Machine Learning, Williams, MA (2001)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL (2003)
Google Scholar
Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: HLT-NAACL, pp. 329–336 (2004)
Google Scholar
Li, X., Bilmes, J.: A Bayesian Divergence Prior for Classifier Adaptation. In: AISTATS-2007. Eleventh International Conference on Artificial Intelligence and Statistics (2007)
Google Scholar
Daumé, III H.: Frustratingly easy domain adaptation. In: Conference of the Association for Computational Linguistics (ACL), Prague, Czech Republic (2007)
Google Scholar
Ando, R., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6, 1817–1853 (2005)
MathSciNet Google Scholar
Chelba, A.: Adaptation of maximum entropy capitalizer: Little data can help a lot. In: EMNLP (2004)
Google Scholar
Jiang, J., Zhai, C.: Exploiting domain structure for named entity recognition. In: HLT-NAACL, pp. 74–81 (2006)
Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain Adaptation with Structural Correspondence Learning. In: Proceedings of the Empirical Methods in Natural Language Processing (EMNLP) (2006)
Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA (2007)
Google Scholar
Globerson, A., Rowels, S.: Nightmare at test time: robust learning by feature deletion. In: ICML, pp. 353–360 (2006)
Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: ACM International Conference Proceeding Series, ACM Press, New York (2004)
Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.: Correcting Sample Selection Bias by Unlabeled Data. In: Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA (2007)
Google Scholar
Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive bayes. In: ICML 1999: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 258–267 (1999)
Google Scholar
Sarawagi, S.: The crf project: a java implementation (2004), http://crf.sourceforge.net
Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient l1 regularized logistic regression. In: AAAI (2006)
Google Scholar
Jiao, F., Wang, S., Lee, C.H., Greiner, R., Schuurmans, D.: Semi-supervised conditional random fields for improved sequence segmentation and labeling. In: ACL (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

IIT Bombay,
Sandeepkumar Satpal & Sunita Sarawagi

Authors

Sandeepkumar Satpal
View author publications
You can also search for this author in PubMed Google Scholar
Sunita Sarawagi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Ramon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Satpal, S., Sarawagi, S. (2007). Domain Adaptation of Conditional Probability Models Via Feature Subsetting. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-74976-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Domain Adaptation of Conditional Probability Models Via Feature Subsetting

Abstract

Chapter PDF

Similar content being viewed by others

$$L_1$$ -Regularized Continuous Conditional Random Fields

A Two-Stage Conditional Random Field Model Based Framework for Multi-Label Classification

A comprehensive review of conditional random fields: variants, hybrids and applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Domain Adaptation of Conditional Probability Models Via Feature Subsetting

Abstract

Chapter PDF

Similar content being viewed by others

$$L_1$$ -Regularized Continuous Conditional Random Fields

A Two-Stage Conditional Random Field Model Based Framework for Multi-Label Classification

A comprehensive review of conditional random fields: variants, hybrids and applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation