Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

  • Joaquim Ferreira da Silva
  • Gaël Dias
  • Sylvie Guilloré
  • José Gabriel Pereira Lopes
Conference paper

DOI: 10.1007/3-540-48159-1_9

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1695)
Cite this paper as:
da Silva J.F., Dias G., Guilloré S., Pereira Lopes J.G. (1999) Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona P., Alferes J.J. (eds) Progress in Artificial Intelligence. EPIA 1999. Lecture Notes in Computer Science, vol 1695. Springer, Berlin, Heidelberg

Abstract

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and non-contiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, ø2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Joaquim Ferreira da Silva
    • 1
  • Gaël Dias
    • 1
  • Sylvie Guilloré
    • 2
  • José Gabriel Pereira Lopes
    • 1
  1. 1.Departamento de InformáticaUniversidade Nova de Lisboa Faculdade de Ciências e TecnologiaMonte da CaparicaPortugal
  2. 2.Laboratoire d’Informatique Fondamentale d’OrléansUniversité d’OrléansOrléans Cédex 2France

Personalised recommendations