Abstract
We design a three-layered collocation extraction tool by integrating syntactic and semantic knowledge and apply it in China English studies. The tool first extracts peripheral collocations in the frequency layer from dependency triples, then extracts semi-peripheral collocations in the syntactic layer by association measures, and last extracts core collocations in the semantic layer with a similar word thesaurus. The syntactic constraints filter out much noise from surface co-occurrences, and the semantic constraints are effective in identifying the very “core” collocations. The tool is applied to automatically extract collocations from a large corpus of China English we compile to explore how China English as a variety of English is nativilized. Then we analyze similarities and differences of the typical China English collocations of a group of verbs. The tool and results can be applied in the compilation of language resources for Chinese-English translation and corpus-based China studies.
References
Seretan, V.: Syntax-based collocation extraction. Text, Speech and Language Technology Series. Springer, Netherlands (2011)
Evert, S.: Corpora and collocations. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics. An International Handbook, pp. 1112–1248. Mouton de Gruyter, Berlin (2008)
Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Wermter, J., Hahn, U.: Paradigmatic modifiability statistics for the extraction of complex multi-word terms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 843–850. Association for Computational Linguistics (2005)
Lin, D.: Extracting collocations from text corpora. In: Proceedings of the First Workshop on Computational Terminology, Montreal, Canada, pp. 57–63 (1998)
Heid, U., Weller, M.: Tools for collocation extraction: preferences for active vs. passive. In: Sixth International Conference on Language Resources & Evaluation LREC, vol. 24, pp. 1266–1272 (2008)
Scott, M.: WordSmith Tools Version 5.0. Lexical Analysis Software, Liverpool (2008)
Li, D., Cao, J., Huang D.: A hierarchical collocation extraction tool. In: The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015), pp. 51–55, Dalian, China, 26–29 August 2015
He, D., Li, D.C.S.: Language attitudes and linguistic features in the “China English” debate. World Englishes 28(1), 70–89 (2009)
Kirkpatrick, A., Zhichang, X.U.: Chinese pragmatic norms and ‘China English’. World Englishes 21(2), 269–279 (2002)
Wei, Y., Jia, F.: Using english in China. Engl. Today 19(4), 42–47 (2003)
Du, R., Jiang, Y.: China English in the past 20 years. 33(1), 37–41 (2001)
Bolton, K., Graddol, D.: English in china today. Engl. Today 28(03), 3–9 (2012)
Yang, J.: Lexical innovations in China English. World Engl. 24(4), 425–436 (2005)
Zhang, H.: Bilingual creativity in Chinese English: Ha Jin’s in the pond. World Engl. 21(2), 305–315 (2002)
Yu, X., Wen, Q.: The nativilized characteristics of evaluative adjective collocational patterns in China’s english-language newspapers. Foreign Lang. Teach. 5, 23–28 (2010)
Ai, H., You, X.: The grammatical features of english in a chinese internet discussion forum. World Engl. 34(2), 211–230 (2015)
Hamid, M.B., Baldauf, Jr., R.B.: Second language errors and features of world Englishes. World Engl. 32(4), 476–494 (2013)
Kachru, B.B.: World Englishes: approaches, issues and resources. Lang. Teach. 25(1), 1–14 (1992)
Bahns, J.: Lexical collocations: a contrastive view. ELT J. 47(1), 56–63 (1993)
Benson, M., Benson, I., Robert, E.: The BBI combinatory dictionary of English: a guide to word combinations, pp. x–xxiii. Benjamins John, New York (1986)
Sinclair, J.: Corpus, Concordance. Collocation. Shanghai Foreign Language Education Press, Shanghai (2000)
Mckeown, K.R., Ravd, D.R.: Collocations. In: Dale, R., Moils, H., Somers, H. (eds.) Handbook of Natural Language Processing, pp. 1–19. CRC Press (2000)
Firth, J.R.: A synopsis of linguistic theory, 1903–1955. In: Studies in Linguistic Analysis (Special volume of the Philological Society), pp. 1–15 (1962)
Bartsch, S., Evert, S.: Towards a firthian notion of collocation. Online publication Arbeiten zui Linguistik. 2, 48–60 (2014)
Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32, 485–525 (2006)
Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of ACL 1999, pp. 317–324. University of Maryland, Maryland (1999)
Alvaro, J.J.: Analyzing China’s english-language media. World Engl. 34(2), 260–277 (2015)
Pereira, L., Strafella, E., Duh, K., Matsumoto, Y.: Identifying collocations using cross-lingual association measures. In: ACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014), pp. 26–27 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Cao, J., Li, D., Huang, D. (2015). A Three-Layered Collocation Extraction Tool and Its Application in China English Studies. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-25816-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25815-7
Online ISBN: 978-3-319-25816-4
eBook Packages: Computer ScienceComputer Science (R0)