Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery

  • Marcos Zampieri
  • Renato Cordeiro de Amorim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8686)


In this paper we revisit the task of spell checking focusing on target word recovery. We propose a new approach that relies on phonetic information to improve the accuracy of clustering algorithms in identifying misspellings and generating accurate suggestions. The use of phonetic information is not new to the task of spell checking and it was used successfully in previous approaches. The combination of phonetics and cluster-based methods for spell checking was to our knowledge not yet explored and it is the new contribution of our work. We report an improvement of 8.16% accuracy when compared to a previously proposed spell checking approach.


spell checking clustering phonetic algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    de Amorim, R.: An adaptive spell checker based on ps3m: Improving the clusters of replacement words. Computer Recognition Systems 3, 519–526 (2009)CrossRefMathSciNetGoogle Scholar
  2. 2.
    de Amorim, R., Zampieri, M.: Effective spell checking methods using clustering algorithms. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2013), Hissar, Bulgaria, pp. 172–178 (2013)Google Scholar
  3. 3.
    Baron, A., Rayson, P.: Vard2: A tool for dealing with spelling variation in historical corpora. In: Postgraduate Conference in Corpus Linguistics (2008)Google Scholar
  4. 4.
    Blair, C.: A program for correcting spelling errors. Information and Control 3, 60–67 (1960)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Carlson, A., Rosen, J., Roth, D.: Scaling up context-sensitive text correction. In: Proceedings of the 13th Innovative Applications of Artificial Intelligence Conference, pp. 45–50. AAAI Press (2001)Google Scholar
  6. 6.
    Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 171–176 (1964)CrossRefGoogle Scholar
  7. 7.
    Golding, A., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine Learning 34, 107–130 (1999)CrossRefzbMATHGoogle Scholar
  8. 8.
    Gorman, T.P.: A survey of attainment and progress of learners in adult literacy schemes. Educational Research 23(3), 190–198 (1981)CrossRefGoogle Scholar
  9. 9.
    Hamilton, M., Stasinopoulos, M.: Literacy, numeracy and adults: Evidence from the national child development study (1987)Google Scholar
  10. 10.
    Islam, A., Inkpen, D.: Real-word spelling correction using googleweb 1t 3-grams. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 1241–1249 (2009)Google Scholar
  11. 11.
    Kaufman, L., Rousseeuw, P.: Finding groups in data: an introduction to cluster analysis, vol. 39. Wiley Online Library (1990)Google Scholar
  12. 12.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 707 (1966)MathSciNetGoogle Scholar
  13. 13.
    Lin, C., Chu, W.: Ntou chinese spelling check system in sighan bake-off 2013. In: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing (SIGHAN-7), Nagoya, Japan, pp. 102–107 (2013)Google Scholar
  14. 14.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, vol. 1, p. 14 (1967)Google Scholar
  15. 15.
    McIlroy, M.: Development of a spelling list. IEEE Transactions on Communications 1, 91–99 (1982)CrossRefGoogle Scholar
  16. 16.
    Mirkin, B.: Clustering for data mining: a data recovery approach, vol. 3. CRC Press (2005)Google Scholar
  17. 17.
    Mitton, R.: English spelling and the computer. Longman (1996)Google Scholar
  18. 18.
    Mitton, R.: Ordering the suggestions of a spellchecker without using context. Natural Language Engineering 15, 173–192 (2009)CrossRefGoogle Scholar
  19. 19.
    Mitton, R.: Fifty years of spellchecking. Writing Systems Research 2, 1–7 (2010)CrossRefGoogle Scholar
  20. 20.
    Morris, R., Cherry, L.: Computer detection of typographical errors. IEEE Transactions on Professional Communication 18, 54–64 (1975)CrossRefGoogle Scholar
  21. 21.
    Pedler, J., Mitton, R.: A large list of confusion sets for spellchecking assessed against a corpus of real-word errors. In: Proceedings of LREC 2010, Malta (2010)Google Scholar
  22. 22.
    Philips, L.: Hanging on the metaphone. Computer Language 7 (December 12, 1990)Google Scholar
  23. 23.
    Philips, L.: The double metaphone search algorithm. C/C++ Users Journal 18(6), 38–43 (2000)MathSciNetGoogle Scholar
  24. 24.
    Pirinen, T., Linden, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-resourced Languages, Malta (2010)Google Scholar
  25. 25.
    Pirinen, T.: Weighted Finite-State Methods for Spell-Checking and Correction. Ph.D. thesis, University of Helsinki (2014)Google Scholar
  26. 26.
    Pollock, J.J., Zamora, A.: Automatic spelling correction in scientific and scholarly text. Communications of the ACM 27(4), 358–368 (1984)CrossRefGoogle Scholar
  27. 27.
    Russel, R.: Soundex (1981)Google Scholar
  28. 28.
    Stüker, S., Fay, J., Berkling, K.: Towards context-dependent phonetic spelling error correction in children’s freely composed text for diagnostic and pedagogical purposes. In: INTERSPEECH, pp. 1601–1604 (2011)Google Scholar
  29. 29.
    Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 144–151. Association for Computational Linguistics (2002)Google Scholar
  30. 30.
    Uzzaman, N., Khan, M.: A bengali phonetic encoding for better spelling suggestions. In: Proceeding of the 7th International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladsh (2004)Google Scholar
  31. 31.
    Verberne, S.: Context-sensitive spell checking based on word trigram probabilities. Master’s thesis, University of Nijmegen (2002)Google Scholar
  32. 32.
    Xu, W., Tetreault, J., Chodorow, M., Grishman, R., Zhao, L.: Exploiting syntactic and distributional information for spelling correction withweb-scale n-gram models. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2011), Edinburgh, Scotland, pp. 1291–1300 (2011)Google Scholar
  33. 33.
    Zampieri, M., Hermes, J., Schwiebert, S.: Identification of patterns and document ranking of internet texts: A frequency-based approach. In: ZSM Studien, Special Volume on Non-Standard Data Sources in Corpus-Based Research, vol. 5. Shaker (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Marcos Zampieri
    • 1
  • Renato Cordeiro de Amorim
    • 2
  1. 1.Saarland UniversityGermany
  2. 2.Birkbeck University of London and Glyndŵr UniversityUnited Kingdom

Personalised recommendations