Skip to main content

Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1695))

Abstract

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and non-contiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, ø 2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abeille, A.: Les nouvelles syntaxes: Grammaires d’unification et Analyse du Français, Armand Colin, Paris (1993)

    Google Scholar 

  2. Bahl, L., & Brown, P., Sousa, P., Mercer, R.: Maximum Mutual Information of Hidden Markov Model Parameters for Speech Recognition. In Proceedings, International Conference on Acoustics, Speech, and Signal Processing Society, Institute of Electronics and Communication Engineers of Japan, and Acoustical Society of Japan (1986)

    Google Scholar 

  3. Blank, I.: Computer-Aided Analysis of Multilingual Patent Documentation, First LREC, (1998) 765–771

    Google Scholar 

  4. Barkema, H.: Determining the Syntactic Flexibility of Idioms, in Fries U., Tottie G., Shneider P. (eds.): Creating and Using English Language Corpora, Rodopi, Amsterdam, (1994), 39–52

    Google Scholar 

  5. Barkema, H.: Idiomaticy in English Nps, in Aarts J., de Haan P., Oostdijk N. (eds.): English Language Corpora: Design, Analysis and Exploitation, Rodopi, Amsterdam, (1993), 257–278

    Google Scholar 

  6. Bourigault, D., Jacquemin, C.: Term Extraction and Term Clustering: an Integrated Platform for Computer Aided Terminology. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, p. 15–22, Bergen, Norway June (1999)

    Google Scholar 

  7. Bourigault, D.: Lexter, a Natural Language Processing Tool for Terminology Extraction, 7th EURALEX International Congress, (1996)

    Google Scholar 

  8. Chengxiang, Z.: Exploiting Context to Identify Lexical Atoms: a Statistical View of Linguistic Context, cmp-lg/9701001, 2 Jan 1997, (1997)

    Google Scholar 

  9. Church, K. et al.: Word Association Norms Mutual Information and Lexicography, Computational Linguistics, Vol. 16(1). (1990) 23–29

    Google Scholar 

  10. Church, K., Gale, W., Hanks, P., Hindle, D.: Using Statistical Linguistics in Lexical Analysis. In Lexical Acquisition: Using On-line Resources to Build a Lexicon, edited by Uri Zernik. Lawrence Erlbaum, Hilldale, New Jersey (1991) 115–165

    Google Scholar 

  11. Dagan, I.: Termight: Identifying and Translating Technical Terminology, 4th Conference on Applied Natural Language Processing, ACL Proceedings (1994)

    Google Scholar 

  12. Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. The Balancing Act Combining Symbolic and Statistical Approaches to Language, MIT Press (1995)

    Google Scholar 

  13. Dias, G., Gilloré, S., Lopes, G.: Language Independent Automatic Acquisition of Rigid Multiword Units from Unrestricted Text corpora. In Proceedings of the TALN’99 (1999).

    Google Scholar 

  14. Dias, G., Gilloré, S., Lopes, G.: Multilingual Aspects of Multiword Lexical Units. In Proceedings of the Workshop Language Technologies-Multilingual Aspects, Faculty of Arts, 8–11 July (1999), Ljubljana, Slovenia

    Google Scholar 

  15. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence, Association for Computational Linguistics, Vol. 19-1. (1993)

    Google Scholar 

  16. Enguehard, C.: Acquisition de Terminologie à partir de Gros Corpus, Informatique & Langue Naturelle, ILN’93 (1993) 373–384

    Google Scholar 

  17. Gale, W.: Concordances for Parallel Texts, Proceedings of Seventh Annual Conference of the UW Centre for the New OED and Text Research, Using Corpora, Oxford (1991)

    Google Scholar 

  18. Habert, B. et al.: Les linguistiques du Corpus, Armand Colin, Paris (1997)

    Google Scholar 

  19. Herviou-Picard et al.: Informatiques, Statistiques et Langue Naturelle pour Automatiser la Constitution de Terminologies, In Proc. ILN’96 (1996)

    Google Scholar 

  20. Jacquemin, C., Royauté, J.: Retrieving Terms and their Variants in a Lexicalized Unification-Based Framework, in: SIGIR’94, Dublin, (1994) 132–141

    Google Scholar 

  21. Justeson, J.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, IBM Research Report, RC 18906 (82591) 5/18/93 (1993)

    Google Scholar 

  22. Marques, N.: Metodologia para a Modelação Estatística da Subcategorização Verbal. Ph.D. Thesis. Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Lisbon, Portugal, Previewed Presentation (1999) (In Portuguese)

    Google Scholar 

  23. Shimohata, S.: Retrieving Collocations by Co-occurrences and Word Order Constraints, Proceedings ACL-EACL’97 (1997) 476–481

    Google Scholar 

  24. Silva, J., Lopes, G.: A local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units. In Proceedings of the 6th Meeting on the Mathematics of Language, Orlando, July 23–25 (1999)

    Google Scholar 

  25. Silva, J., Lopes, G.: Extracting Multiword Terms from Document Collections. In Proceedings of the VExTAL, Venezia per il Trattamento Automatico delle Lingu, Universiá Cá Foscari, Venezia November 22–24 (1999)

    Google Scholar 

  26. Silva, J., Lopes, G., Xavier, M., Vicente, G.: Relevant Expressions in Large Corpora. In Proceedings of the Atelier-TALN99, Corse, july 12–17 (1999)

    Google Scholar 

  27. Smadja, F. et al.: Translating Collocations for Bilingual Lexicons: A Statistical Approach, Association for Computational Linguistics, Vol. 22(1) (1996)

    Google Scholar 

  28. Smadja, F.: From N-grams to Collocations: An Evaluation of Extract. In Proceedings, 29th Annual Meeting of the ACL (1991). Berkeley, Calif., 279–284

    Google Scholar 

  29. Smadja, F.: Retrieving Collocations From Text: XTRACT, Computational Linguistics, Vol. 19(1). (1993) 143–177

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

da Silva, J.F., Dias, G., Guilloré, S., Pereira Lopes, J.G. (1999). Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J.J. (eds) Progress in Artificial Intelligence. EPIA 1999. Lecture Notes in Computer Science(), vol 1695. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48159-1_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-48159-1_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66548-9

  • Online ISBN: 978-3-540-48159-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics