Skip to main content

From Spelling Correction to Text Cleaning – Using Context Information

  • Conference paper
Data Analysis, Machine Learning and Applications

Abstract

Spelling correction is the task of correcting words in texts. Most of the available spelling correction tools only work on isolated words and compute a list of spelling suggestions ranked by edit-distance, letter-n-gram similarity or comparable measures. Although the probability of the best ranked suggestion being correct in the current context is high, user intervention is usually necessary to choose the most appropriate suggestion (Kukich, 1992).

Based on preliminary work by Sabsch (2006), we developed an efficient context sensitive spelling correction system dcClean by combining two approaches: the edit distance based ranking of an open source spelling corrector and neighbour co-occurrence statistics computed from a domain specific corpus. In combination with domain specific replacement and abbreviation lists we are able to significantly improve the correction precision compared to edit distance or context based spelling correctors applied on their own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AL-MUBAID, H. and TRÜMPER, K. (2006): Learning to Find Context-Based Spelling Errors, In: E. and Felici. G. (Eds.):Data Mining and Knowledge Discovery Ap-proaches Based on Rule Induction Techniques. Triantaphyllou. Massive Computing Series, Springer, Heidelberg, Germany, 597-628.

    Google Scholar 

  • ELMI, M. A. and EVENS, M. (1998): Spelling correction using context. In: Proceedings of the 17th international Conference on Computational Linguistics - Volume 1 (Montreal, Quebec, Canada). Morristown, NJ, 360-364.

    Chapter  Google Scholar 

  • GOLDING, A. R (1995): A Bayesian hybrid method for context-sensitive spelling correction. In: Proceedings of the Third Workshop on Very Large Corpora, Boston, MA.

    Google Scholar 

  • GOLDING, A. R., and ROTH, D. (1999): A Winnow based approach to context-sensitive spelling correction. Machine Learning 34 (1-3):107-130. Special Issue on Machine Learning and Natural Language.

    Article  MATH  Google Scholar 

  • HEYER, G., QUASTHOFF, U. and WITTIG, T. (2006): Text Mining: Wissensrohstoff Text -Konzepte, Algorithmen, Ergebnisse. W3L Verlag, Herdecke, Bochum.

    Google Scholar 

  • KUKICH, K. (1992). Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 4:377-439.

    Article  Google Scholar 

  • MANNING, C. and SCHÜTZE, H. (1999): Foundations of Statistical Natural Language Pro-cessing. The M.I.T. Press, Cambridge (Mass.) and London. 151-187.

    Google Scholar 

  • PHILIPS, L. SABSCH, R. (2006): Kontextsensitive und domänenspezifische Rechtschreibko-rrektur durch Einsatz von Wortassoziationen. Diplomarbeit, Universität Leipzig.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schierle, M., Schulz, S., Ackermann, M. (2008). From Spelling Correction to Text Cleaning – Using Context Information. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_47

Download citation

Publish with us

Policies and ethics