Building a Collocation Net

Zhou, GuoDong; Zhang, Min; Fu, GuoHong

doi:10.1007/11940098_56

Building a Collocation Net

GuoDong Zhou^22,23,
Min Zhang²³ &
GuoHong Fu²⁴

Conference paper

1008 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Abstract

This paper presents an approach to build a novel two-level collocation net, which enables calculation of the collocation relationship between any two words, from a large raw corpus. The first level consists of atomic classes (each atomic class consists of one word and feature bigram), which are clustered into the second level class set. Each class in both levels is represented by its collocation candidate distribution, extracted from the linguistic analysis of the raw training corpus, over possible collocation relation types. In this way, all the information extracted from the linguistic analysis is kept in the collocation net. Our approach applies to both frequently and less-frequently occurring words by providing a clustering mechanism resolve the data sparseness problem through the collocation net. Experimentation shows that the collocation net is efficient and effective in solving the data sparseness problem and determining the collocation relationship between any two words.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Church, K.W., Patrick, H.: Word Association Norms, Mutural Information and Lexicography. In: ACL 1989, pp. 76–83 (1989)
Google Scholar
Church, K.W., William, A.G.: A Comparison of the Enhanced Good Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams. Computer, Speech and Language 5(1), 19–54 (1991)
Article Google Scholar
Church, K.W., Robert, L.M.: Introduction to Special Issue on Computational Linguistics Using Large Corpora. Computational Linguistics 19(1), 1–24 (1993)
Google Scholar
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Halliday, M.: Lexis as a linguistic level. In: Bazell, C., Catford, J., Halliday, M., Robins, R. (eds.) memory of J.R.Firth, Longman (1966)
Google Scholar
Hindle, D., Rooth, M.: Structural Ambiguity and Lexical Relations. Computational Linguistics 19(1), 102–119 (1993)
Google Scholar
Justeson, J.S., Katz, S.M.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering 1(1), 9–27 (1995)
Article Google Scholar
Julian, K., Pederson, J., Chen, F.: A Trainable Document Summarizer. In: SIGIR 1995, pp. 68–73 (1995)
Google Scholar
Manning, C.D., Schutze, H.: Fundations of Statistical Natural Language Processing, p. 185. MIT Press, Cambridge (1999)
Google Scholar
Meyer, D., et al.: Loci of Contextual Effects on Visual Word Recognition. In: Rabbitt, P., Dornie, S. (eds.) Attention and Performance V, pp. 98–116. Academic Press, London (1975)
Google Scholar
Ross, I.C., Tukey, J.W.: Introduction to these Volumes. In: Tukey, J.W. (ed.) Index to Statistics amd Probability, pp. Iv-x. R&D Press, Los Altos (1975)
Google Scholar
Rosenfeld, R.: Adaptive Statistical Language Modeling: A Maximum Entropy Approach. Ph.D. Thesis, Carneige Mellon University (1994)
Google Scholar
Smadja, F.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Google Scholar
Snedecor, G.W., William, G.C.: Statistical Methods, p. 127. Iowa State University Press, Ames (1989)
MATH Google Scholar
Yang, J.: Towards the automatic Acquisition of Lexical Selection Rules. MT Summit VII, Singapore, pp. 397–403 (1999)
Google Scholar
Yuret, D.: Discovery of Linguistic Relations Using Lexical Attraction. Ph.D thesis. cmp-lg/9805009. MIT (1998)
Google Scholar
Zhao, J., Huang, C.N.: Aquasi-Dependency Model for the Structural Analysis of Chinese BaseNPs. In: COLING-ACL 1998, Univ. de Montreal, Canada, pp. 1–7 (1998)
Google Scholar
Zhou, G.D., Lua, K.T.: Word Association and MI-Trigger-based Language Modeling. In: COLING-ACL 1998, Univ. of Montreal, Canada, pp. 1465–1471 (1998)
Google Scholar
Zhou, G.D., Lua, K.T.: Interpolation of N-gram and MI-based Trigger Pair Language Modeling in Mandarin Speech Recognition. Computer, Speech and Language 13(2), 123–135 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Suzhou University, 215006, China
GuoDong Zhou
Institute for Infocomm Research, 119613, Singapore
GuoDong Zhou & Min Zhang
Department of Linguistics, The University of Hong Kong, Hong Kong
GuoHong Fu

Authors

GuoDong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
GuoHong Fu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, G., Zhang, M., Fu, G. (2006). Building a Collocation Net. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_56

Download citation

DOI: https://doi.org/10.1007/11940098_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics