From Spelling Correction to Text Cleaning – Using Context Information

Schierle, Martin; Schulz, Sascha; Ackermann, Markus

doi:10.1007/978-3-540-78246-9_47

Martin Schierle⁵,
Sascha Schulz⁶ &
Markus Ackermann⁷

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

6060 Accesses
7 Citations

Abstract

Spelling correction is the task of correcting words in texts. Most of the available spelling correction tools only work on isolated words and compute a list of spelling suggestions ranked by edit-distance, letter-n-gram similarity or comparable measures. Although the probability of the best ranked suggestion being correct in the current context is high, user intervention is usually necessary to choose the most appropriate suggestion (Kukich, 1992).

Based on preliminary work by Sabsch (2006), we developed an efficient context sensitive spelling correction system dcClean by combining two approaches: the edit distance based ranking of an open source spelling corrector and neighbour co-occurrence statistics computed from a domain specific corpus. In combination with domain specific replacement and abbreviation lists we are able to significantly improve the correction precision compared to edit distance or context based spelling correctors applied on their own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AL-MUBAID, H. and TRÜMPER, K. (2006): Learning to Find Context-Based Spelling Errors, In: E. and Felici. G. (Eds.):Data Mining and Knowledge Discovery Ap-proaches Based on Rule Induction Techniques. Triantaphyllou. Massive Computing Series, Springer, Heidelberg, Germany, 597-628.
Google Scholar
ELMI, M. A. and EVENS, M. (1998): Spelling correction using context. In: Proceedings of the 17th international Conference on Computational Linguistics - Volume 1 (Montreal, Quebec, Canada). Morristown, NJ, 360-364.
Chapter Google Scholar
GOLDING, A. R (1995): A Bayesian hybrid method for context-sensitive spelling correction. In: Proceedings of the Third Workshop on Very Large Corpora, Boston, MA.
Google Scholar
GOLDING, A. R., and ROTH, D. (1999): A Winnow based approach to context-sensitive spelling correction. Machine Learning 34 (1-3):107-130. Special Issue on Machine Learning and Natural Language.
Article MATH Google Scholar
HEYER, G., QUASTHOFF, U. and WITTIG, T. (2006): Text Mining: Wissensrohstoff Text -Konzepte, Algorithmen, Ergebnisse. W3L Verlag, Herdecke, Bochum.
Google Scholar
KUKICH, K. (1992). Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 4:377-439.
Article Google Scholar
MANNING, C. and SCHÜTZE, H. (1999): Foundations of Statistical Natural Language Pro-cessing. The M.I.T. Press, Cambridge (Mass.) and London. 151-187.
Google Scholar
PHILIPS, L. SABSCH, R. (2006): Kontextsensitive und domänenspezifische Rechtschreibko-rrektur durch Einsatz von Wortassoziationen. Diplomarbeit, Universität Leipzig.
Google Scholar

Download references

Author information

Authors and Affiliations

DaimlerChrysler AG, Germany
Martin Schierle
Humboldt-University, Berlin, Germany
Sascha Schulz
University of Leipzig, Germany
Markus Ackermann

Authors

Martin Schierle
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Schulz
View author publications
You can also search for this author in PubMed Google Scholar
Markus Ackermann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schierle, M., Schulz, S., Ackermann, M. (2008). From Spelling Correction to Text Cleaning – Using Context Information. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics