Systematic review of spell-checkers for highly inflectional languages

Abstract

Performance of any word processor, search engine, social media relies heavily on the spell-checkers, grammar checkers etc. Spell-checkers are the language tools which break down the text to check the spelling errors. It cautions the user if there is any unintentional misspelling occurred in the text. In the area of spell-checking, we still lack an exhaustive study that covers aspects like strengths, limitations, handled errors, performance along with the evaluation parameters. In literature, spell-checkers for different languages are available and each one possesses similar characteristics however, have a different design. This study follows the guidelines of systematic literature review and applies it to the field of spell-checking. The steps of the systematic review are employed on 130 selected articles published in leading journals, premier conferences and workshops in the field of spell-checking of different inflectional languages. These steps include framing of the research questions, selection of research articles, inclusion/exclusion criteria and the extraction of the relevant information from the selected research articles. The literature about spell-checking is divided into key sub-areas according to the languages. Each sub-area is then described based on the technique being used. In this study, various articles are analyzed on certain criteria to reach the conclusion. This article suggests how the techniques from the other domains like morphology, part-of-speech, chunking, stemming, hash-table etc. can be used in development of spell-checkers. It also highlights the major challenges faced by researchers along with the future area of research in the field of spell-checking.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Abbreviations

FSM:

Finite state machine

DLM:

Dictionary lookup method

MA:

Morphological analysis

ED:

Edit distance

MED:

Minimum edit distance

US:

Unicode splitting

cbLSTM:

Character-based longest short term memory

SM:

Soundex method

LED:

Levenstein edit distance

CS:

Confusion set

rMED:

Reverse minimum edit distance

DDLM:

Direct dictionary lookup method

EDM:

Edit distance method

PEM:

Phonetic encoding method

FSR:

Finite state representation

STM:

State table method

FSA:

Finite state automata

PAMC:

Partition around medoid clustering

DMEE:

Double metaphone encoding

WF:

Word frequency

S&SS:

Sound and shape similarity

rEDM:

Reverse edit distance method

HT:

Hash table

TBA:

Tree-based algorithm

POS:

Parts of speech

HMM:

Hidden Markov model

GUI:

Graphical user interface

FST:

Finite state transition

UWH:

Unknown word handling

UPH:

Unknown proper noun handling

POS:

Parts of speech

API:

Application programming interface

CW:

Constituent word

MBLP:

Memory based language model

FSTM:

Finite state transition model

DA:

Dictionary approach

CC:

Canti check

CSO:

Crowd sourcing

References

  1. Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON)

  2. Abeera VP, Aparna S, Rekha RU, Kumar MA, Dhanalakshmi V (2012) Morphological analyzer for Malayalam. In: Data engineering and management, pp 252–254

  3. Allen JD et al (2012) The unicode standard, vol 3. Mountain view, CA

    Google Scholar 

  4. Ambili T, Panchami KS, Subash N (2016) Automatic error detection and correction in Malayalam. IJSTE Int J Sci Technol Eng 3(02):92–96

    Google Scholar 

  5. Angell RC, Freund GE, Willett P (1983) Automatic spelling correction using a tri-gram similarity measure. Inf Process Manag 19(4):255–261

    Google Scholar 

  6. Badugu S (2014) Morphology based POS tagging on Telugu. Int J Comput Sci Issues 11(1):181–187

    Google Scholar 

  7. Balabantaray C, Sahoo B, Swain M, Sahoo K (2012) IIIT-Bh FIRE 2012 submission: MET Track Odia, pp 1–3

  8. Banks T (2008) Strategies, foreign language larning difficulaties and teching. Dominican University of California, San Rafael

    Google Scholar 

  9. Bansal A, Banerjee E, Jha GN (2013) Corpora creation for Indian language technologies—The ILCI Project. In: The 6th proceedings of language technology conference (LTC ‘13)

  10. Bhatti Z, Ismaili IA (2016) Phonetic-based Sindhi spell-checker system using a hybrid model. Digit Scholarsh Humanit 31(2):264–282

    Google Scholar 

  11. Bhatti Z, Ismaili IA, Shaikh AA, Javaid W (2012) Spelling error trends and patterns in Sindhi. J Emerg Trends Comput Inf Sci 3(10):1435–1439

    Google Scholar 

  12. Bhatti Z, Ismaili IA, Soomro WJ, Hakro DN (2014) Word segmentation model for Sindhi text. Am J Comput Res Repos 2(1):1–7

    Google Scholar 

  13. Bhowmik K (2014) Development of a word-based spell-checker for Bangla language. Military Institute of Science and Technology, United International University, Dhaka

    Google Scholar 

  14. Borah PP, Talukdar G, Baruah A (2014) Assamese word sense disambiguation using supervised learning. In: International conference on contemporary computing and informatics (IC3I). IEEE, pp 946–950

  15. Bruno M, Silva MJ (2004) Spelling correction for search engine queries. In: Advanced natural language processing. Springer, Berlin, pp 372–383

  16. Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47

    MATH  Google Scholar 

  17. Budgen D, Brereton P (2006) Performing systematic literature reviews in software engineering. In: ICSE’06 Proceedings of the 28th international conference on Software engineering, pp 1051–1052

  18. Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp 161–175

  19. Chakrabarti B (1994) A comparative study of Santali and Bengali. K.P. Bagchi & Co., Kolkata

    Google Scholar 

  20. Chaudhuri BB (2001) Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text. In: Proceeding of LESAL Workshop, Mumbai

  21. Chaudhuri BB (2002) Towards Indian language spell-checker design. In: Proceedings—language engineering conference, LEC 2002, pp 139–146

  22. Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas V, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188

    Google Scholar 

  23. Cordeiro de Amorim R, Zampieri M (2013) Recent advances in natural language processing. In: IEEE international conference on recent advances in natural language processing, pp 172–178

  24. Dahar IA, Abbas F, Rajput U, Hussain A, Azhar F (2018) An efficient Sindhi spelling checker for microsoft word. Int J Comput Sci Netw Secur 18(5):144–150

    Google Scholar 

  25. Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176

    Google Scholar 

  26. Das M, Borgohain S, Gogoi J, Nair SB (2002a) Design and implementation of a spell-checker for Assamese. In: Language engineering conference, Proceedings IEEE, pp 156–162

  27. Das M, Borgohain S, Gogoi J (2002b) Design and implementation of a spell-checker for Assamese. In: Language engineering conference, proceedings IEEE, pp 156–162

  28. Das M, Borgohain S, Gogoi J, Nair SB (2002c) Design and implementation of a spell-checker for Assamese. In: Proceedings—language engineering conference, LEC 2002, pp 156–162

  29. Daud A, Khan W, Che D (2016) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311

    Google Scholar 

  30. Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell-checker. In: 6th Tamil internet conference, Chennai, Tamilnadu, India, pp 18–27

  31. Dhanju KS, Lehal GS, Saini TS, Kaur A (2015) Design and implementation of Punjabi spell-checker. Int J Sci Technol 8(27):1–12

    Google Scholar 

  32. Dongre VJ, Mankar VH (2010) A review of research on Devnagari character recognition. Int J Comput Appl 12(2):8–15

    Google Scholar 

  33. Dowlagar S, Mamidi R (2015) A semi supervised dialog act tagging for Telugu. In: Proceedings of the 12th international conference on natural language processing, pp 376–383

  34. Etoori P, Chinnakotla M, Mamidi R (2018) Automatic spelling correction for resource-scarce languages using deep learning. In: Proceeding of ACL 2018, Student research workshop, pp 146–152

  35. Fossati F, Di Eugenio B (2007) I saw TREE trees in the park: how to correct real-word spelling mistakes. In: LREC, pp 896–901

  36. Ganfure GO, Midekso D (2014) Design and implementation of morphology based spell-checker. Int J Sci Technol Res 3(12):118–125

    Google Scholar 

  37. Ghafour HHA, El-bastawissy A, Heggazy AFA (2011) AEDA : Arabic edit distance algorithm towards a new approach for Arabic name matching. In: IEEE, international conference on computer engineering and systems, pp 307–311

  38. Gokcay E, Gokcay D (1995) Combining statistics and heuristics for language identification. In: Proceedings of the 4th annual symposium on document analysis and information retrieval

  39. Gottron T, Lipka N (2010) A comparison of language identification approaches on short, query-style texts. Lecture notes in computer science, pp 611–614

  40. Goyal V, Lehal GS (2010) Automatic standardization of spelling variations of Hindi text. In: International conference on computer and communication technology ICCCT 2010, pp 764–767

  41. Gupta V (2014) Automatic stemming of words for Punjabi. In: Advances in signal processing and intelligent recognition systems, pp 73–84

  42. Gupta P, Goyal V (2009) Implementation of rule-based algorithm for Sandhi-Vicheda of compound Hindi words. Int J Comput Sci Issues 3:45–49

    Google Scholar 

  43. Gupta V, Lehal GS (2011) Punjabi language stemmer for nouns and proper names. In: Proceedings of the 2nd workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, pp 35–39

  44. Gupta V, Lehal GS (2019) Complete pre processing phase of Punjabi text extractive summarization system. In: Proceedings of COLING 2012: demonstration papers, pp 199–206

  45. Harrison GL, Goegan LD, Jalbert R, Mcmanus K, Sinclair K, Spurling J (2016) Predictors of spelling and writing skills in first and second language learners. Read Writ 29(1):69–89

    Google Scholar 

  46. Hassan A, Amin MR, Al Azad AK, Mohammed N (2017) Sentiment analysis on Bangla and Romanized Bangla text using deep recurrent models. In: IWCI 2016—2016 international workshop on computational intelligence, pp 51–56

  47. Hayes B, Lahiri A (1991) Bengali international phonology. Nat Lang Linguist Theory 9(1):47–96

    Google Scholar 

  48. Hema PH, Sunitha C (2016) Malayalam spell-checker using N-gram method. In: Computational intelligence in data mining-advances in intelligent systems and computing, vol 1, pp 217–225

  49. Heshaam F (2010) Detection and correction of real-word spelling errors in Persian language. In: IEEE-international conference on natural language processing and knowledge engineering (NLP-KE)

  50. Hoque T, Kaykobad M (2002) Coding system for Bangla spell-checker. In: 5th international conference on computer and information technology, pp 186–190

  51. Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the 6th New Zealand computer science research student conference (NZCSRSC2008)

  52. Humayoun M, Ranta A (2014) Developing Punjabi morphology, corpus and lexicon. In: Proceedings of the 24th Pacific Asia conference on language, information and computation

  53. Hussain I, Saharia N, Sharma U (2011) Development of assamese wordnet. In: Nath B, Sharma U, Bhattacharyya DK (eds) Machine intelligence: recent advances. Narosa Publishing House, ISBN-978-81-8487-140-1

  54. Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell-checking: reverse edit distance approach. In: Proceedings of the 4th workshop on South and Southeast Asian Natural Language Processing, pp 58–65

  55. Islam A, Inkpen D (2009) Real-word spelling correction using google web 1T 3-grams. In: EMNLP’09, conference on empirical methods in natural language processing, pp 1241–1249

  56. Islam MZ, Uddin M, Khan M (2007) A light weight stemmer for Bengali and its use in spelling checker. In: Proceedings of international conference on digital communication and computer applications (DCCA), pp 19–23

  57. Jain A, Jain M (2014) Detection and correction of non-word spelling errors in Hindi language. In: International conference on data mining and intelligent computing (ICDMIC)

  58. Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353

    Google Scholar 

  59. Jananie S, Sarveswaran K (2014) Hybrid approach for spell-checking of Tamil language. In: Proceedings of the Peradeniya University, International Research Session, vol 18, no 1

  60. Jindal S (2017) Building English–Punjabi parallel corpus for machine translation. Int J Comput Appl 180(8):26–29

    Google Scholar 

  61. Justin Z, Dart P (1995) Finding approximate matches in large lexicons. Softw Pract Exp 25(3):331–345

    Google Scholar 

  62. Kabeer R, Idicula SM (2014) Text summarization for Malayalam documents—an experience. In: Proceedings of international conference on data science and engineering, ICDSE 2014, pp 145–150

  63. Kashyap K, Sarma H, Sarma SK (2015) Luitspell: development of an Assamese language spell-checker for open office writer. Eur J Adv Eng Technol 2(5):135–138

    Google Scholar 

  64. Kashyap L, Joshi SR, Bhattacharyya P (2017) Insights on Hindi WordNet coming from the IndoWordNet. In: The Wordnet in Indian languages, pp 19–43

  65. Kaur H, Kaur G, Kaur M (2015) Punjabi spell-checker using dictionary clustering. Int J Sci Eng Technol Res 4(7):2369–2374

    Google Scholar 

  66. Keselj V, Peng F, Cercone N, Thomas C (2003) N-gram based author profiles for authorship attribution. In: Proceedings of the conference of the Pacific association for computational linguistics (PACLING)

  67. Khan NH, Saha GC (2014) Checking the correctness of Bangla words using N-gram. Int J Comput Appl 89(11):1–3

    Google Scholar 

  68. Kleenankandy J (2014) Implementation of Sandhi-rule based compound word generator for Malayalam. In: Proceedings of 4th international conference on advances in computing and communications, ICACC 2014, pp 134–137

  69. Kukich K (1992) Technique for automatically correcting words in text. ACM Comput Surv 24(4):377–439

    Google Scholar 

  70. Kumar SS, Suma S, Sneha N (2017) Spell-checker for Kannada OCR. Int Digit Libr Technol Res 1(4):1–12

    Google Scholar 

  71. Lakshmi K, Babu T (2018) A new hybrid algorithm for Telugu word retrieval and recognition. Int J Intell Eng Syst 11(4):117–127

    Google Scholar 

  72. Lawaye AA, Purkayastha BS (2016) Design and implementation of spell-checker for Kashmiri. Int J Sci Res 5(7):199–202

    Google Scholar 

  73. Lehal GS (2007) Design and implementation of Punjabi spell-checker. Int J Syst Cybern Inform 3(8):70–75

    Google Scholar 

  74. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710

    MathSciNet  Google Scholar 

  75. Mahar JA, Shaikh H, Memon GQ (2012) A model for Sindhi text segmentation in word tokens. Sindh Univ Res J SUR J (Sci Ser) 44(1):43–48

    Google Scholar 

  76. Mala C, Parameshwari K, Rao GUM, Kulkarni AP (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8

  77. Mandal P, Hossain BMM (2017a) Clustering-based Bangla Spell-checker. In: IEEE international conference on imaging, vision and pattern recognition (icIVPR)

  78. Mandal P, Hossain BMM (2017b) A systematic literature review on spell-checkers for Bangla language. Int J Mod Educ Comput Sci 9(6):40–47

    Google Scholar 

  79. Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spell-checker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161

  80. Mateen A, Malik MK, Nawaz Z, Danish HM, Siddiqui MH (2017) A hybrid stemmer of Punjabi Shahmukhi script. Int J Comput Sci Netw Secur 17(8):90–97

    Google Scholar 

  81. Mishra D, Venugopalan M, Gupta D (2016) Context-specific lexicon for Hindi reviews. In: 6th international conference on advances in computing and communications, ICACC 2016, vol 93, pp 554–563

  82. Mittal S, Sethi NS, Sharma SK (2014) Part of speech tagging of Punjabi language using N gram model. Int J Comput Appl 100(19):20–23

    Google Scholar 

  83. Mohapatra DD (2018) A sketch of Odia morphology. Glob J Res Anal 7(4):80–81

    Google Scholar 

  84. Mon AM (2012) Spell-checker for Myanmar language. In: International conference on information retrieval and knowledge management (CAMP). IEEE, pp 12–16

  85. Murthy KN (2001) Computer processing of Kannada language. Workshop at Kannada University, pp 1–10

  86. Mustafa SH (2005) Character contiguity in N -gram-based word matching: the case for Arabic text searching. Inf Process Manag 41:819–827

    Google Scholar 

  87. Naseem T (2004) A hybrid approach for Urdu spell-checking. National University of Computer & Emerging Sciences

  88. Naseem T, Hussain S (2007) A novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128

    Google Scholar 

  89. Nielsen J (1999) Internet-based spelling checker dictionary system with automatic updating

  90. Nisha M, Reji Rahmath K, Rekha Raj CT, Reghu Raj PC (2015) Malayalam morphological analysis using MBLP approach. In: Proceedings of international conference on soft-computing and network security, ICSNS 2015

  91. Pareek G, Modi D (2016) Feature extraction in Hindi text summarization. Ski Res J 6(2):14–19

    Google Scholar 

  92. Peterson JL (1980) Computer programs for detecting and correcting spelling errors. Commun ACM 23(12):676–687

    Google Scholar 

  93. Prathibha RJ, Padma MC (2016) Design of morphological analyzer for Kannada inflectional words using hybrid approach. Int J Comput Linguist Res 7(4):133–161

    Google Scholar 

  94. Pratip S, Chaudhuri BB (2013) A simple real-word error detection and correction using local word bigram and trigram. In: Proceedings of the 25th conference on computational linguistics and speech processing (ROCLING 2013), pp 211–220

  95. Puri R, Bedi RPS, Goyal V (2015) Punjabi stemmer using Punjabi wordnet database. Indian J Sci Technol 8(27):1–5

    Google Scholar 

  96. Rahman MU (2015) Towards Sindhi corpus construction. In: Conference on language and technology, pp 1–6

  97. Rahutomo F, Kitasuka T, Aritsugi M (2012) Semantic cosine similarity. In: 7th international student conference on advanced science and technology ICAST

  98. Rajashekara Murthy S, Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell-checker with Sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956

  99. Rajashekara Murthy S, Madi V, Sachin D, Ramakanth PK (2012) A non-word Kannada spell-checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52

    Google Scholar 

  100. Rama T, Sowmya V (2018) A dependency treebank for Telugu. In: Proceedings of the 16th international workshop on treebanks and linguistics theories, pp 119–128

  101. Robertson AM, Willet P (1998) Applications of N-grams in textual information systems. J Doc 54(1):48–67

    Google Scholar 

  102. Rout Y, Santi PK, Subudhi S, Sahu B (2013) An approach for designing Odia spell-checker. In: National conference on recent advances on business intelligence & data mining (RABIDM 2013), pp 1–7

  103. Saharia N (2011) A first step towards parsing of Assamese text. Spec Vol Probl Parsing Indian Lang 11(5):30–34

    Google Scholar 

  104. Saharia N, Konwar KM (2012) LiuitPad: a fully unicode compatible Assamese writing software. In: Proceedings of the 2nd workshop an advances in text input methods (WTIM 2) COLLING 2012, pp 79–88

  105. Saharia N, Das D, Sharma U, Kalita J (2009) Part of speech tagger for Assamese text. In: Proceedings of the ACL-IJCNLP conference short papers, pp 33–36

  106. Saharia N, Sharma U, Kalita J (2012) Analysis and evaluation of stemming algorithms : a case study with Assamese. In: International conference on advances in computing, communications and informatics, ICACCI 2012, pp 842–846

  107. Sahoo K, Vidyasagar VE (2003) Kannada WordNet—a lexical database. In: Conference on convergent technologies for Asia-Pacific Region (TENCON 2003), vol 4, pp 1352–1356

  108. Sakuntharaj R, Mahesan S (2016) A novel hybrid approach to detect and correct spelling in Tamil text. In: International conference on information and automation for sustainability: interoperable sustainable smart systems for next generation, ICIAFS 2016. IEEE, pp 1–6

  109. Sakuntharaj R, Mahesan S (2017) Use of a novel hash-table for speeding-up suggestions for misspelt Tamil words. In: International conference on industrial and information systems (ICIIS) IEEE, pp 1–5

  110. Santosh T, Sulochana KG, Kumar RR (2002) Malayalam spell-checker. In: Proceedings of the international conference on universal knowledge and language

  111. Saranya SK (2008) Morphological analyzer for Malayalam verbs. Amrita Vishwa Vidyapeetham, Amrita School of Engineering, Coimbatore

    Google Scholar 

  112. Sarma P (2017) An approach to prepare lexicons of Assamese text for unit selection concatenation TTS. Int J Emerg Trends Sci Technol 4(8):5631–5637

    Google Scholar 

  113. Sarma SK, Medhi R, Gogoi M, Saikia U (2010) Foundation and structure of developing an Assamese wordnet. In: Proceedings of 5th international conference of the global WordNet Association

  114. Sarmah J, Barman AK, Sharma SK (2013) Automatic Assamese text categorization using wordnet. In: International conference of advances in computing, communications and informatics IEEE, pp 85–89

  115. Segar J, Sarveswaran K (2015) Contextual spell-checking for Tamil language. In: 14th Tamil internet conference, pp 1–5

  116. Sekhar N, Pushpak D, Jyoti B (2017) The WordNet in Indian Languages. Springer Nature, Singapore

    Google Scholar 

  117. Sethi DP (2014) A survey on Odia computational morphology. Int J Adv Res Comput Eng Technol 3(3):623–625

    Google Scholar 

  118. Shaalan K, Allam A, Gomah A (2003) Towards automatic spell-checking for Arabic. In: Proceedings of the 4th conference on language engineering, Egyptian Society of language engineering (ELSE), Egypt, pp 240–247

  119. Shah ZA, Mashori GM (2013) Oxford English-Sindhi dictionary: a critical study in lexicography. ELF Annu Res J 13:37–46

    Google Scholar 

  120. Shambhavi BR, Ramakanth Kumar P, Srividya K, Jyothi BJ, Kundargi S, Shastri G (2011) Kannada morphological analyser and generator using trie. Int J Comput Sci Netw Secur 11(1):112–116

    Google Scholar 

  121. Sheykholeslam MH, Minaei-Bidgoli B, Juzi H (2013) A framework for spelling correction in Persian language using noisy channel model. In: LREC, pp 58–65

  122. Singh A (2016) Review for dialects in Punjabi language. Int J Innov Adv Comput Sci 5(8):25–30

    Google Scholar 

  123. Singh J, Singh G, Singh R, Singh P (2018) Morphological evaluation and sentiment analysis of Punjabi text using deep learning classification. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2018.04.003

    Article  Google Scholar 

  124. Sinha RMK, Singh KS (1984) A programme for correction of single spelling errors in Hindi words. IETE J Res 30(6):249–251

    Google Scholar 

  125. Solak A (1993) Design and implementation of a spelling checker for Turkish. Institute of Engineering & Sciences, Bilkent University, Ankara

    Google Scholar 

  126. Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell-checker for Malayalam language. J Intell Fuzzy Syst 34(3):1427–1434

    Google Scholar 

  127. Strnad J (2001) Hindi dictionaries and the Hindi lexicographical corpus. Festschrift Helmut Nespital, pp 1–14

  128. Subhashini R, Kumar VJS (2010) Evaluating the performance of similarity measures used in document clustering and information retrieval. In: IEEE, 1st international conference on integrated intelligent computing

  129. Tomovic A, Janicic P, Keselj V (2006) N-gram based classification and unsupervised hierarchical clustering of genome sequences. Comput Methods Program Biomed 81:137–153

    Google Scholar 

  130. Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka

    Google Scholar 

  131. Varghese ST, Sulochana KG, Kumar RR (2002) Malayalam spell-checker. In: Proceedings of the international conference on universal knowledge and language

  132. Veerappan R, Antony PJ, Saravanan S, Soman KP (2011) A rule-based Kannada morphological analyzer and generator using finite state transducer. Int J Comput Appl 27(10):45–52

    Google Scholar 

  133. Verberne S (2002) Context-sensitive spell-checking based on word trigram probabilities

  134. Wasala A, Weerasinghe R, Pushpananda R (2010) A data-driven approach to checking and correcting spelling errors in Sinhala. Int J Adv ICT Emerg Reg 03(01):11–24

    Google Scholar 

  135. Wu S, Mamber U (1992) AGREP—a fast approximate pattern matching tool. In: Proceedings of the Winter 1992 USENIX conference San Francisco USA. Berkeley, pp 153–162

  136. Yue T, Briand LC, Labiche Y (2011) A systematic review of transformation approaches between user requirements and analysis models. Requir Eng 16(2):75–99

    Google Scholar 

  137. Zampieri M, Cordeiro de Amorim R (2014) Between sound and spelling: combining phonetics and clustering algorithms to improve target word recovery. In: International conference on natural language processing, pp 438–449

  138. Zhang Y, Zhao X (2013) Automatic error detection and correction of text: the state of the art. In: 6th international conference on intelligent networks and intelligent systems, ICINIS, pp 274–277

  139. Zhuang L, Bao T, Zhu X, Wang C, Naoi S (2004) A Chinese OCR spelling check approach based on statistical language models. In: International conference on systems, man and cybernetics, IEEE, vol 5, pp 4727–4732

Download references

Acknowledgements

The authors thank the reviewers for their insightful comments. The authors would also like to thank Ministry of Electronics and IT, Government of INDIA, for providing fellowship under Grant Number: PhD-MLA-4 (69)/2015-16 (Visvesvaraya PhD Scheme for Electronics and IT) to pursue Ph.D. work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Shashank Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Singh, S., Singh, S. Systematic review of spell-checkers for highly inflectional languages. Artif Intell Rev 53, 4051–4092 (2020). https://doi.org/10.1007/s10462-019-09787-4

Download citation

Keywords

  • Spelling
  • Spell-check
  • Non-word errors
  • Real-word errors
  • Dictionary lookup
  • Edit-distance
  • Recurrent neural network (RNN)