ICEIS 2012: Enterprise Information Systems pp 228-242 | Cite as
Towards a Leaner Evaluation Process: Application to Error Correction Systems
Abstract
While they follow similar procedures, evaluations of state of the art error correction systems always rely on different resources (collections of documents, evaluation metrics, dictionaries, ...). In this context, error correction approaches cannot be directly compared without being re-implemented from scratch every time they have to be compared with a new one. In other domains such as Information Retrieval this problem is solved through Cranfield like experiments such as TREC [5] evaluation campaign. We propose a generic solution to overcome those evaluation difficulties through a modular evaluation platform which formalizes similarities between evaluation procedures and provides standard sets of instantiated resources for particular domains. While this was our main problem at first, in this article, the set of resources is dedicated to the evaluation of error correction systems. The idea is to provide the leanest way to evaluate error correction systems by implementing only the core algorithm and relying on the platform for everything else.
Keywords
Evaluation model Framework Error correction Textual documents Distance and similarity measure Metrics Information retrievalReferences
- 1.Atkinson, K.: Aspell Spellchecker. http://aspell.net (2012). Accessed 15 Jan 2012
- 2.Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)Google Scholar
- 3.Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Nat. Lang. Eng. 11(1), 87–111 (2005)CrossRefGoogle Scholar
- 4.Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms, Chapter 13. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, vol. 305, pp. 305–332. MIT Press, Cambridge (1998)Google Scholar
- 5.Kantor, P.B., Voorhees, E.M.: The TREC-5 confusion track: comparing retrieval methods for scanned text. Inf. Retrieval 2(2), 165–176 (2000)Google Scholar
- 6.Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. (CSUR) 24(4), 439 (1992)Google Scholar
- 7.Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manag. 27(5), 517–522 (1991)Google Scholar
- 8.Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)Google Scholar
- 9.Mitton, R.: Ordering the suggestions of a spellchecker without using context. Nat. Lang. Eng. 15(02), 173–192 (2008)Google Scholar
- 10.Mudge, R.: After the Deadline. http://static.afterthedeadline.com (2012). Accessed 15 Jan 2012
- 11.OSGi-Alliance. Open Services Gateway initiative. http://www.osgi.org (2012). Accessed 15 Jan 2012
- 12.Pedler, J.: Computer correction of real-word spelling errors in dyslexic text. Ph.D. thesis, Birkbeck, London University (2007)Google Scholar
- 13.Rosnay, J., Revelli, C.: Pronetarian Revolution (2006)Google Scholar
- 14.Ruch, P.: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, p. 7. Association for Computational Linguistics (2002)Google Scholar
- 15.Shannon, C.: A mathematical theory of communication. Bell Sys. Tech. J. 27(379–423), pp. 623–656 (1948)Google Scholar
- 16.Subramaniam, L.V., Roy, S., Faruquie, T.A., Negi, S.: A Survey of Types of Text Noise and Techniques to Handle Noisy Text. Language, pp. 115–122 (2009)Google Scholar
- 17.Varnhagen, C.K., McFall, G.P., Figueredo, L., Takach, B.S., Daniels, J., Cuthbertson, H.: Spelling and the web. J. App.l. Develop. Psychol. 30(4), 454–462 (2009)CrossRefGoogle Scholar
- 18.Voorhees, E.M., Garofolo, J.: The TREC-6 spoken document retrieval track. Bull. Am. Soc. Inf. Sci. Technol. 26(5), 18–19 (2000)CrossRefGoogle Scholar
- 19.Wikipedia Community. Wikipedia List of Common Misspellings. http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings (2012). Accessed 15 Jan 2012
- 20.Wiktionary Community. Wiktionary Online Collaborative Dictionary. http://en.wiktionary.org/wiki/Wiktionary:Main_Page (2012). Accessed 15 Jan 2012
- 21.Wilcox-O’Hearn, A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 605–616. Springer, Heidelberg (2008)Google Scholar
- 22.Wong, W., Liu, W., Bennamoun, M.: Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In: 5th Australasian conference on Data mining and analystics (AusDM’06), Sydney, Australia, pp. 83–89. Australian Computer Society (2006)Google Scholar