Detection and Correction of Malapropisms in Spanish by Means of Internet Search

  • Igor A. Bolshakov
  • Sofia N. Galicia-Haro
  • Alexander Gelbukh
Conference paper

DOI: 10.1007/11551874_15

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3658)
Cite this paper as:
Bolshakov I.A., Galicia-Haro S.N., Gelbukh A. (2005) Detection and Correction of Malapropisms in Spanish by Means of Internet Search. In: Matoušek V., Mautner P., Pavelka T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science, vol 3658. Springer, Berlin, Heidelberg

Abstract

Malapropisms are real-word errors that lead to syntactically correct but semantically implausible text. We report an experiment on detection and correction of Spanish malapropisms. Malapropos words semantically destroy collocations (syntactically connected word pairs) they are in. Thus we detect possible malapropisms as words that do not form semantically plausible collocations with neighboring words. As correction candidates, we select words similar to the suspected one but forming plausible collocations with neighboring words. To judge semantic plausibility of a collocation, we use Google statistics of occurrences of the word combination and of the two words taken apart. Since collocation components can be separated by other words in a sentence, Google statistics is gathered for the most probable distance between them. The statistics is recalculated to a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Our experiments gave promising results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Sofia N. Galicia-Haro
    • 2
  • Alexander Gelbukh
    • 1
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico
  2. 2.Faculty of SciencesNational Autonomous University of Mexico (UNAM)Mexico

Personalised recommendations