Chapter

Progress in Artificial Intelligence

Volume 1695 of the series Lecture Notes in Computer Science pp 113-132

Date:

Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

  • Joaquim Ferreira da SilvaAffiliated withDepartamento de Informática, Universidade Nova de Lisboa Faculdade de Ciências e Tecnologia
  • , Gaël DiasAffiliated withDepartamento de Informática, Universidade Nova de Lisboa Faculdade de Ciências e Tecnologia
  • , Sylvie GuilloréAffiliated withLaboratoire d’Informatique Fondamentale d’Orléans, Université d’Orléans
  • , José Gabriel Pereira LopesAffiliated withDepartamento de Informática, Universidade Nova de Lisboa Faculdade de Ciências e Tecnologia

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and non-contiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, ø 2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.