Skip to main content

An Experiment in Detection and Correction of Malapropisms Through the Web

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Abstract

Malapropism is a type of semantic errors. It replaces one content word by another content word similar in sound but semantically incompatible with the context and thus destructing text cohesion. We propose to signal a malapropism when a pair of syntactically linked content words in a text exhibits the value of a specially defined Semantic Compatibility Index (SCI) lower than a predetermined threshold. SCI is computed through the web statistics of occurrences of words got together and apart. A malapropism detected, all possible candidates for correction of both words are taken from precompiled dictionaries of paronyms, i.e. words distant a letter or a few prefixes or suffixes from one another. Heuristic rules are proposed to retain only a few highly SCI-ranked candidates for the user’s decision. The experiment on mala-propism detection and correction is done for a hundred Russian text fragments—mainly from the web newswire—in both correct and falsified form, as well as for several hundreds of correction candidates. The raw statistics of occurrences is taken from the web searcher Yandex. Within certain limitations, the experiment gave very promising results.

Work done under partial support of Mexican Government (CONACyT, SNI) and CGEPI-IPN, Mexico. Many thanks to Denis Filatov for help with manuscript preparation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bolshakov, I.A.: Getting one’s first million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Bolshakov, I.A., Gelbukh, A.: On Detection of Malapropisms by Multistage Collocation Testing. In: Düsterhöft, A., Talheim, B. (eds.) Proc. 8th Intern. Conference on Applications of Natural Language to Information Systems NLDB 2003, GI edn., Burg, Germany, Bonn, June 2003. LNI, vol. P-29, pp. 28–41 (2003)

    Google Scholar 

  3. Bolshakov, I.A., Gelbukh, A.: Paronyms for Accelerated Correction of Semantic Errors. International Journal on Information Theories & Applications 10, 198–204 (2003)

    Google Scholar 

  4. Gelbukh, A., Bolshakov, I.A.: On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034, pp. 105–114. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Gelbukh, A., Sidorov, G., Chanona-Hernández, L.: Compilation of a Spanish representative corpus. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 285–288. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computational linguistics 29(3), 459–484 (2003)

    Article  Google Scholar 

  7. Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational linguistics 29(3), 333–347 (2003)

    Article  MathSciNet  Google Scholar 

  8. The New Encyclopædia Britannica. Micropædia, vol. 7, Encyclopædia Britannica, Inc. (1998)

    Google Scholar 

  9. Hirst, G., St-Onge, D.: Lexical Chains as Representation of Context for Detection and Cor-rections of Malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Data-base, pp. 305–332. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Hirst, G., Budanitsky, A.: Correcting Real-Word Spelling Errors by Restoring Lexical Co-hesion. In: Natural Language Engineering (2004) (to appear)

    Google Scholar 

  11. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  12. Mel’čuk, I.: Dependency Syntax: Theory and Practice. SUNY Press, NY (1988)

    Google Scholar 

  13. Pedersen, T.: A decision tree of bigrams is an accurate predictor of word senses. In: Proc. 2nd Annual Meeting of NAC ACL, Pittsburgh, PA, pp. 79–86 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bolshakov, I.A. (2005). An Experiment in Detection and Correction of Malapropisms Through the Web. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_91

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_91

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics