From Spelling Correction to Text Cleaning – Using Context Information

  • Martin Schierle
  • Sascha Schulz
  • Markus Ackermann
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Spelling correction is the task of correcting words in texts. Most of the available spelling correction tools only work on isolated words and compute a list of spelling suggestions ranked by edit-distance, letter-n-gram similarity or comparable measures. Although the probability of the best ranked suggestion being correct in the current context is high, user intervention is usually necessary to choose the most appropriate suggestion (Kukich, 1992).

Based on preliminary work by Sabsch (2006), we developed an efficient context sensitive spelling correction system dcClean by combining two approaches: the edit distance based ranking of an open source spelling corrector and neighbour co-occurrence statistics computed from a domain specific corpus. In combination with domain specific replacement and abbreviation lists we are able to significantly improve the correction precision compared to edit distance or context based spelling correctors applied on their own.


Context Information Word Frequency Spelling Error Unknown Word Levenshtein Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. AL-MUBAID, H. and TRÜMPER, K. (2006): Learning to Find Context-Based Spelling Errors, In: E. and Felici. G. (Eds.):Data Mining and Knowledge Discovery Ap-proaches Based on Rule Induction Techniques. Triantaphyllou. Massive Computing Series, Springer, Heidelberg, Germany, 597-628.Google Scholar
  2. ELMI, M. A. and EVENS, M. (1998): Spelling correction using context. In: Proceedings of the 17th international Conference on Computational Linguistics - Volume 1 (Montreal, Quebec, Canada). Morristown, NJ, 360-364.CrossRefGoogle Scholar
  3. GOLDING, A. R (1995): A Bayesian hybrid method for context-sensitive spelling correction. In: Proceedings of the Third Workshop on Very Large Corpora, Boston, MA.Google Scholar
  4. GOLDING, A. R., and ROTH, D. (1999): A Winnow based approach to context-sensitive spelling correction. Machine Learning 34 (1-3):107-130. Special Issue on Machine Learning and Natural Language.zbMATHCrossRefGoogle Scholar
  5. HEYER, G., QUASTHOFF, U. and WITTIG, T. (2006): Text Mining: Wissensrohstoff Text -Konzepte, Algorithmen, Ergebnisse. W3L Verlag, Herdecke, Bochum.Google Scholar
  6. KUKICH, K. (1992). Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 4:377-439.CrossRefGoogle Scholar
  7. MANNING, C. and SCHÜTZE, H. (1999): Foundations of Statistical Natural Language Pro-cessing. The M.I.T. Press, Cambridge (Mass.) and London. 151-187.Google Scholar
  8. PHILIPS, L. SABSCH, R. (2006): Kontextsensitive und domänenspezifische Rechtschreibko-rrektur durch Einsatz von Wortassoziationen. Diplomarbeit, Universität Leipzig.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Martin Schierle
    • 1
  • Sascha Schulz
    • 2
  • Markus Ackermann
    • 3
  1. 1.DaimlerChrysler AGGermany
  2. 2.Humboldt-UniversityBerlinGermany
  3. 3.University of LeipzigGermany

Personalised recommendations