Bilingual Parallel Active Learning Between Chinese and English

Qian, Longhua; Liu, JiaXin; Zhou, Guodong; Zhu, Qiaoming

doi:10.1007/978-3-319-50496-4_10

Longhua Qian^18,19,
JiaXin Liu^18,19,
Guodong Zhou^18,19 &
…
Qiaoming Zhu^18,19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

4624 Accesses

Abstract

Active learning is an effective machine learning paradigm which can significantly reduce the amount of labor for manually annotating NLP corpora while achieving competitive performance. Previous studies on active learning are focused on corpora in one single language or two languages translated from each other. This paper proposes a Bilingual Parallel Active Learning paradigm (BPAL), where an instance-level parallel Chinese and English corpus adapted from OntoNotes is augmented for relation extraction and both the seeds and jointly selected unlabeled instances at each iteration are parallel between two languages in order to enhance active learning. Experimental results on the task of relation classification on the corpus demonstrate that BPAL can significantly outperform monolingual active learning. Moreover, the success of BPAL suggests a new way of annotating parallel corpora for NLP tasks in order to induce two high-performance classifiers in two languages respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

ACE: Automatic Content Extraction (2002–2007). http://www.ldc.upenn.edu/Projects/ACE/
Brew, A., Greene, D., Cunningham, P.: Using crowdsourcing and active learning to track sentiment in online media. In: ECAI 2010, pp. 145–150 (2010)
Google Scholar
Chan, Y.S., Ng, H.T.: Domain adaptation with active learning for word sense disambiguation. In: ACL 2007 (2007)
Google Scholar
Engelson, S.P., Dagan, I.: Minimizing manual annotation cost in supervised training from corpora. IN: ACL 1996, pp. 319–326 (1996)
Google Scholar
Feng, D., Lü, Y., Zhou, M.: A new approach for English-Chinese named entity alignment. In: EMNLP 2004, pp. 372–379 (2004)
Google Scholar
Haffari, G., Sarkar, A.: Active learning for multilingual statistical machine translation. In: ACL-IJCNLP 2009, pp. 181–189 (2009)
Google Scholar
Hwa, R.: Sample selection for statistical parsing. Comput. Linguist. 30(3), 253–276 (2004)
Article MathSciNet MATH Google Scholar
Li, S.S., Ju, S.F., Zhou, G.D., Li, X.J.: Active learning for imbalanced sentiment classification. In: EMNLP-CoNLL 2012, pp. 139–148 (2012)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL-IJCNLP 2009, pp. 1003–1011 (2009)
Google Scholar
Osborne, M., Baldridge, J.: Ensemble based active learning for parse selection. IN: HLT-NAACL 2004, pp. 89–96 (2004)
Google Scholar
Qian, L.H., Hui, H.T., Hu, Y.N., Zhou, G.D., Zhu, Q.M.: Bilingual active learning for relation classification via pseudo parallel corpora. In: ACL 2014, 582–592 (2014)
Google Scholar
Reichart, R., Tomanek, K., Hahn, U., Rappoport, A.: Multi-task active learning for linguistic annotations. In: ACL 2008, pp. 861–869 (2008)
Google Scholar
Ringger, E., McClanahan, P., Haertel, R., Busby, G., Carmen, M., Carroll, J., Seppi, K., Lonsdale, D.: Active learning for part-of-speech tagging: accelerating corpus annotation. In: Proceedings of Linguistic Annotation Workshop at ACL 2007, pp. 101–108 (2007)
Google Scholar
Schein, A.I., Ungar, L.H.: Active learning for logistic regression: an evaluation. Mach. Learn. 68(3), 235–265 (2007)
Article Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, University of Wisconsin, Madison (2009)
Google Scholar
Shen, D., Zhang, J., Su, J., Zhou, G.D., Tan, C.L.: Multi-criteria-based active learning for named entity recognition. IN: ACL 2004 (2004)
Google Scholar
Tomanek, K., Hahn, U.: Semi-supervised active learning for sequence labeling. In: ACL-IJCNLP 2009, pp. 1039–1047 (2009)
Google Scholar
Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: EMNLP-CoNLL 2007, pp. 486–495 (2007)
Google Scholar
Zhang, Y.: Multi-task active learning with output constraints. In: AAAI 2010 (2010)
Google Scholar
Zhou, G.D., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: ACL 2005, pp. 427–434 (2005)
Google Scholar
Zhu, J.B., Hovy, E.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: EMNLP-CoNLL 2007, pp. 783–790 (2007)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical report 1530, Department of Computer Sciences, University of Wisconsin, Madison (2005)
Google Scholar

Download references

Acknowledgement

This work is funded by the National Natural Science Foundation of China [Grant Nos. K111817913, 61373096, 61305088, and 90920004] and the Research and Innovation Project for College Graduates of Jiangsu Province (Grant No. SJLX16_0536).

Author information

Authors and Affiliations

Natural Language Processing Lab, Soochow University, Suzhou, 215006, Jiangsu, China
Longhua Qian, JiaXin Liu, Guodong Zhou & Qiaoming Zhu
School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Longhua Qian, JiaXin Liu, Guodong Zhou & Qiaoming Zhu

Authors

Longhua Qian
View author publications
You can also search for this author in PubMed Google Scholar
JiaXin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qiaoming Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Longhua Qian .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qian, L., Liu, J., Zhou, G., Zhu, Q. (2016). Bilingual Parallel Active Learning Between Chinese and English. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_10
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics