Towards a Leaner Evaluation Process: Application to Error Correction Systems

  • Arnaud Renard
  • Sylvie Calabretto
  • Batrice Rumpler
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 141)

Abstract

While they follow similar procedures, evaluations of state of the art error correction systems always rely on different resources (collections of documents, evaluation metrics, dictionaries, ...). In this context, error correction approaches cannot be directly compared without being re-implemented from scratch every time they have to be compared with a new one. In other domains such as Information Retrieval this problem is solved through Cranfield like experiments such as TREC [5] evaluation campaign. We propose a generic solution to overcome those evaluation difficulties through a modular evaluation platform which formalizes similarities between evaluation procedures and provides standard sets of instantiated resources for particular domains. While this was our main problem at first, in this article, the set of resources is dedicated to the evaluation of error correction systems. The idea is to provide the leanest way to evaluate error correction systems by implementing only the core algorithm and relying on the platform for everything else.

Keywords

Evaluation model Framework Error correction Textual documents Distance and similarity measure Metrics Information retrieval 

References

  1. 1.
    Atkinson, K.: Aspell Spellchecker. http://aspell.net (2012). Accessed 15 Jan 2012
  2. 2.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)Google Scholar
  3. 3.
    Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Nat. Lang. Eng. 11(1), 87–111 (2005)CrossRefGoogle Scholar
  4. 4.
    Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms, Chapter 13. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, vol. 305, pp. 305–332. MIT Press, Cambridge (1998)Google Scholar
  5. 5.
    Kantor, P.B., Voorhees, E.M.: The TREC-5 confusion track: comparing retrieval methods for scanned text. Inf. Retrieval 2(2), 165–176 (2000)Google Scholar
  6. 6.
    Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. (CSUR) 24(4), 439 (1992)Google Scholar
  7. 7.
    Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manag. 27(5), 517–522 (1991)Google Scholar
  8. 8.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)Google Scholar
  9. 9.
    Mitton, R.: Ordering the suggestions of a spellchecker without using context. Nat. Lang. Eng. 15(02), 173–192 (2008)Google Scholar
  10. 10.
    Mudge, R.: After the Deadline. http://static.afterthedeadline.com (2012). Accessed 15 Jan 2012
  11. 11.
    OSGi-Alliance. Open Services Gateway initiative. http://www.osgi.org (2012). Accessed 15 Jan 2012
  12. 12.
    Pedler, J.: Computer correction of real-word spelling errors in dyslexic text. Ph.D. thesis, Birkbeck, London University (2007)Google Scholar
  13. 13.
    Rosnay, J., Revelli, C.: Pronetarian Revolution (2006)Google Scholar
  14. 14.
    Ruch, P.: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, p. 7. Association for Computational Linguistics (2002)Google Scholar
  15. 15.
    Shannon, C.: A mathematical theory of communication. Bell Sys. Tech. J. 27(379–423), pp. 623–656 (1948)Google Scholar
  16. 16.
    Subramaniam, L.V., Roy, S., Faruquie, T.A., Negi, S.: A Survey of Types of Text Noise and Techniques to Handle Noisy Text. Language, pp. 115–122 (2009)Google Scholar
  17. 17.
    Varnhagen, C.K., McFall, G.P., Figueredo, L., Takach, B.S., Daniels, J., Cuthbertson, H.: Spelling and the web. J. App.l. Develop. Psychol. 30(4), 454–462 (2009)CrossRefGoogle Scholar
  18. 18.
    Voorhees, E.M., Garofolo, J.: The TREC-6 spoken document retrieval track. Bull. Am. Soc. Inf. Sci. Technol. 26(5), 18–19 (2000)CrossRefGoogle Scholar
  19. 19.
    Wikipedia Community. Wikipedia List of Common Misspellings. http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings (2012). Accessed 15 Jan 2012
  20. 20.
    Wiktionary Community. Wiktionary Online Collaborative Dictionary. http://en.wiktionary.org/wiki/Wiktionary:Main_Page (2012). Accessed 15 Jan 2012
  21. 21.
    Wilcox-O’Hearn, A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 605–616. Springer, Heidelberg (2008)Google Scholar
  22. 22.
    Wong, W., Liu, W., Bennamoun, M.: Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In: 5th Australasian conference on Data mining and analystics (AusDM’06), Sydney, Australia, pp. 83–89. Australian Computer Society (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Arnaud Renard
    • 1
    • 2
  • Sylvie Calabretto
    • 1
    • 2
  • Batrice Rumpler
    • 1
    • 2
  1. 1.Universit de LyonCNRSLyonFrance
  2. 2.INSA-LyonLIRIS, UMR 5205Villeurbanne CedexFrance

Personalised recommendations