Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Automatic error correction in inflected languages

  • 73 Accesses

Abstract

Systems for automatic detection and correction of spelling errors in natural language texts are considered. The development of such systems for both English and Russian (and for inflected languages in general, including all Slavic languages) is discussed. An approach associated with morphological analysis of the wordforms in the given text is described. The topics considered in the paper include the main methods of automatic spelling correction, levels of automation of the spelling error correction process, the effect of the type of computer used, the use of spelling error correctors in a stand-alone mode and in combination with word-processing software, and the maintenance of auxiliary dictionaries.

This is a preview of subscription content, log in to check access.

Literature cited

  1. 1.

    A. Andreevski, F. Debili, and K. Flur, “On one important property of the vocabulary of natural languages and its use for automatic correction of spelling errors,” in: Applied and Experimental Linguistic Processors [in Russian], VTs SO AN SSSR, Novosibirsk (1982), pp. 98–109.

  2. 2.

    O. B. Babko-Malaya and V. A. Shemrakov, Methods and Systems for Automatic Detection and Correction of Text Errors [in Russian], Preprint No. 5, BAN SSSR, Leningrad (1987).

  3. 3.

    G. G. Belonogov and others, “Automatic detection and correction of errors in scientific and technical texts,” NTI, Ser. 2, No. 6, 29–31 (1982).

  4. 4.

    G. G. Belonogov et al., “An algorithm for multistep morphological analysis of Russian words,” NTI, Ser. 2, No. 1, 6–10 (1983).

  5. 5.

    G. G. Belonogov et al., “An experimental system for automatic detection and correction of spelling errors in texts,” NTI, Ser. 2, No. 3, 30–37 (1984).

  6. 6.

    G. G. Belonogov et al., “Experience with an experimental system for detection of spelling errors in VINITI,” Vopr. Informatsionnoi Teor. Praktiki, No. 51, 24–44 (1984).

  7. 7.

    G. G. Belonogov and B. A. Kuznetsov, Language Tools in Computer-Aided Systems [in Russian], Nauka, Moscow (1983).

  8. 8.

    G. G. Belonogov, B. A. Kuznetsov, and A. P. Novoselov, “Computer-aided processing of scientific-technical information. Linguistic aspects,” Itogi Nauki i Tekhniki, Informatika, Vol. 8, VINITI, Moscow (1984).

  9. 9.

    I. A. Bol'shakov, “Automatic detection and correction of errors as a technological prerequisite for semantic text processing,” in: Semiotic Aspects of Formalization of Intellectual Activity, abstracts of papers and reports [in Russian], VINITI, Moscow (1983), pp. 179–182.

  10. 10.

    I. A. Bol'shakov, “Simplified morphological anlysis for automatic spelling checking of texts,” NTI, Ser. 2, No. 6, 22–28 (1985).

  11. 11.

    I. A. Bol'shakov, “Automatic checking of hyphenation,” NTI, Ser. 2, No. 2, 28–31 (1986).

  12. 12.

    I. A Bol'shakov, “DISKOR — an interactive spelling error correction system,” NTI, Ser. 2, No. 5–15 (1986).

  13. 13.

    I. A. Bol'shakov, “On purely automatic spelling error correction relying on the keying model of typical errors,” NTI, Ser. 2, No. 3, 27–31 (1987).

  14. 14.

    I. A. Bol'shakov and E. V. Emelin, “An algoritm for minimizing the graph representation of dictionaries,” Izv. Akad. Nauk SSSR, Tekh. Kibern., No. 4, 3–13 (1987).

  15. 15.

    I. M. Boyarinov et al., “Use of noise-tolerant coding to protect the information from operator errors,” Avtomat. Telemekh., No. 2, 5–49 (1983).

  16. 16.

    I. L. Bratchikov, “A method for detection and correction of errors in Russian wordforms based on the distance function,” in: Semiotic Aspects of Formalization of Intellectual Activity, abstracts of papers and reports [in Russian], VINITI, Moscow (1985), pp. 393–395.

  17. 17.

    V. N. Volkov and A. V. Ivanisov, “An implementation of the algorithm for recognizing and selecting words using the matching function,” Programmirovanie, No. 2, 90–92 (1982).

  18. 18.

    E. A. Dedikov and R. N. Chen, “Organization of a correcting machine dictionary of names using an additive hashing function,” Probl. Bioniki (Khar'kov), No. 28, 14–19 (1982).

  19. 19.

    W. Denning, G. Essig, and S. Maas, Interactive Man-Machine Systems. Adaptation to User Requirements [Russian translation], Mir, Moscow (1984).

  20. 20.

    A. S. Dolgopolov, “Machine recognition of spelling errors in data,” Upravlyayushchie Sistemy i Mashiny (Kiev), No. 5, 79–82 (1976).

  21. 21.

    A. S. Dolgopolov, “An automatic spelling error corrector,” NTI, Ser. 2, No. 3, 27–38 (1985).

  22. 22.

    A. S. Dolgopolov, “Nonbinary codes correcting symbol insertions, deletions, and replacements,” in: Problems of Information Transmission [in Russian], No. 1 (1985).

  23. 23.

    A. S. Dolgopolov, “Language optimization of communication systems,” NTI, Ser. 2, No. 9, 12–15 (1985).

  24. 24.

    A. S. Dolgopolov, “A program for automatic spelling error correction,” NTI, Ser. 2, No. 4, 26–29 (1986).

  25. 25.

    R. L. Episkoposyan, “A time-reducing method for automatic correction of spelling errors,” Upravlyayushchie Sistemy i Mashiny (Kiev), No. 6, 82–84 (1983).

  26. 26.

    L. Yu. Korostelev, “Some features of processing of unrecognized words in a machine translation system,” NTI, Ser. 2, No. 4, 23–28 (1985).

  27. 27.

    Yu. V. Krasikov, Theory of Speech Errors (A Case Study of Typesetting Errors) [in Russian], Nauka, Moscow (1980).

  28. 28.

    K. I. Kurbakov, Information Coding and Search in a Computerized Dictionary [in Russian], Sovet-skoe Radio, Moscow (1968).

  29. 29.

    V. I. Levenshtein, “Binary codes with correction of symbol deletions, insertions, and substitutions,” Dokl. Akad. Nauk SSSR,163, No. 4, 845–848 (1965).

  30. 30.

    S. A. Matveev and R. A. Sotnikova, “A system for automatic error correction in word combinations,” Programmirovanie, No. 5, 68–74 (1984).

  31. 31.

    Z. V. Partyko, “Analysis of errors arising during text entry in the ASSISTENT information-retrieval system,” NTI, Ser. 2, No. 1, 21–26 (1982).

  32. 32.

    Z. V. Partykov, Methods of Spelling Correction and Editing by Computer [in Russian], Kniga, Moscow (1983).

  33. 33.

    M. V. Pozdnyak and others, “A system for automatic detection and correction of spelling errors,” Voprosy Invormatsionnoi Teorii i Praktiki, No. 51, 12–23 (1984).

  34. 34.

    N. Yu. Salmina and I. A. Khodashinskii, “Methods and tools for automatic spelling error correction,” NTI, Ser. 2, No. 10, 25–28 (1986).

  35. 35.

    V. A. Suchilin, “Error detection in scientific texts by the tools of a computer-aided information-retrieval system,” NTI, Ser. 2, No. 2, 39 (1983).

  36. 36.

    Ya. P. Shturman, “Analysis of automatic spelling error detection systems,” NTI, Ser. 2, No. 9, 21–24 (1985).

  37. 37.

    Ya. P. Shturman and Z. V. Partyko, “Analysis of errors during the entry of abstracts in the ASSISTENT system,” NTI, Ser. 2, No. 3, 17–31 (1982).

  38. 38.

    R. C. Angel et al., “Automatic spelling correction using a trigram similarity measure,” Inform. Process. Manag.,19, No. 4, 255–261 (1983).

  39. 39.

    C. N. Alberga, “Strong similarity and misspellings,” Commun. ACM,10, No. 5, 302–313 (1967).

  40. 40.

    H. L. Berghel, “A logical framework for correction of spelling errors in electronic documents,” Inform. Process. Manag.,23, No. 5, 477–494 (1987).

  41. 41.

    C. R. Blair, “A program for correcting spelling errors,” Inform. Contr.,3, 60–67 (1960).

  42. 42.

    B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Commun. ACM,13, No. 7, 422–426 (1977).

  43. 43.

    C. P. Bourne, “Frequency and impact of spelling errors in bibliographic data bases,” Inform. Process. Manag.,13, No. 1, 1–12 (1977).

  44. 44.

    W. A. Burkhard, “Partial match retrieval,” Bit,15, No. 1, 13–31 (1976).

  45. 45.

    W. A. Burkhard, “Associative retrieval tree hash coding,” J. Comp. Syst. Sci.,15, No. 3, 280–299 (1977).

  46. 46.

    C. H. Chen, “Finite sample considerations in statistical pattern recognition,” IEEE Comp. Conf. on Pattern Recognition (1978), pp. 188–192.

  47. 47.

    A. Cobham, “Representation of a word function as the sum of two functions,” Math. Syst. Theory,11, No. 4, 31–36 (1978).

  48. 48.

    D. Comer and V. Y. Shen, “Hash-bucket search: a fast technique for searching in English spelling dictionary,” Software Pract. Exper.,12, 669–682 (1982).

  49. 49.

    W. S. Cooper, “The storage problem,” Mech. Transl.,5, No. 2, 78–83 (1958).

  50. 50.

    R. W. Cornew, “A statistical method of spelling correction,” Inform. Contr.,12, 79–93 (1968).

  51. 51.

    F. Damerau, “A technique for computer detection and correction of spelling errors,” Commun. ACM,7, No. 3, 171–176 (1964).

  52. 52.

    W. Doster and J. Schurmann, “An application of the modified Viterbi algorithm in text recognition,” Proc. 5th Int. Conf. on Pattern Recognition, Miami Beach, Fl. (1980), pp. 855–863.

  53. 53.

    M. R. Dunleavey, “On spelling correction and beyond,” Commun. ACM,24, No. 9, 608 (1981).

  54. 54.

    I. Durham and others, “Spelling correction in user interfaces,” Commun. ACM,26, No. 10, 764–773 (1981).

  55. 55.

    S. Feyock, “Transition diagram-based CAI/HELP systems,” Int. J. Man-Machine Studies, No. 9, 339–413 (1977).

  56. 56.

    N. V. Findler and J. V. Leenwen, “A family of similarity measures between two strings,” IEEE Trans. Pattern Anal. Machine Intell.,1, No. 1, 116–118 (1979).

  57. 57.

    A. Foisy and G. Lapalme, “Structuration d'un dictionnaire français pour la detection de faute d'orthographe,” Actes de journées sur la manipulation de documents, Rennes, Le Chesnay, 4–6 Mai (1983), pp. 176–180.

  58. 58.

    G. D. Fourney, “The Viterbi algorithm,” IEEE Proc.,61, No. 3, 268–278 (1983).

  59. 59.

    T. H. Friedman, “An algorithm for finding best matches in logarithmic expected time,” ACM Trans. Math. Software,3, 209–226 (1977).

  60. 60.

    R. Furuta, J. Scofield, and A. Shaw, “Document formatting systems: survey, concepts, and issues,” Comp. Surveys,14, No. 3, 417–472 (1982).

  61. 61.

    E. J. Galli and H. M. Yamada, “An automatic dictionary and verification of machine readable text,” IBM Syst. J.,6, No. 3, 192–207 (1967).

  62. 62.

    E. J. Galli and H. M. Yamada, “Experimental studies of computer-assisted correction of unorthographic text,” IEEE Trans. Eng. Writing and Speech,11, No. 2, 73–84 (1968).

  63. 63.

    J. J. Gingardella and others, “Spelling correction by representation using digital computer,” IEEE Trans. Eng. Writing and Speech,10, No. 2, 57–65 (1967).

  64. 64.

    S. L. Graham and S. P. Rhodes, “Practical syntactic error recovery,” Commun. ACM,18, No. 11, 639–650 (1975).

  65. 65.

    E. C. Grenias and W. S. Rosenbaum, “Automatic spelling verification: towards a system solution for the office,” Information Technology, Proc. 3rd Jerusalem Conf. on Information Technology (1978), pp. 225–231.

  66. 66.

    P. A. V. Hall and G. R. Dowling, “Approximate string matching,” Comp. Surveys,12, No. 4, 381–402 (1980).

  67. 67.

    G. E. Heidorn and others, “The EPISTLE text critiquing system,” IBM Syst. J.,21, No. 3, 305–326 (1982).

  68. 68.

    G. G. Hendrix, “Human engineering for applied natural language processing,” Proc. 5th Int. Joint Conf. on Artificial Intelligence, Cambridge, Mass.,1, 183–191 (1977).

  69. 69.

    D. S. Hirshberg, “A linear space algorithm for computing maximal common subsequences,” Commun. ACM,18, No. 6, 341–343 (1975).

  70. 70.

    D. S. Hirshberg, “Algorithms for the longest common subsequence problem,” J. ACM,34, No. 4, 664–675 (1977).

  71. 71.

    W. J. Hsu and M. W. Du, “Computing a longest common subsequence for a set of strings,” Bit,24, No. 1, 45–49 (1984).

  72. 72.

    W. J. Hsu and M. W. Du, “New algorithms for LCS problem,” J. Comp. Syst. Sci.,29, No. 2, 133–152 (1984).

  73. 73.

    J. W. Hunt and T. G. Szymanski, “A fast algorithm for computing longest common subsequences,” Commun. ACM,20, No. 5, 350–353 (1977).

  74. 74.

    T. Ito and M. Kizawa, “Hierarchical file organization and its application to similar-string matching,” ACM Trans. Database Systems,8, No. 3, 410–433 (1982).

  75. 75.

    D. M. Joseph and R. L. Wong, “Correction of misspellings and typographical errors in free text medical English information storage and retrieval systems,” Math. Inform. Med.,18, No. 4, 228–234 (1979).

  76. 76.

    I. Karczewski, I. Chodorowski, and M. Michalewicz, Automatic Correction, Warszawa (1975).

  77. 77.

    R. L. Kashyap and B. J. Oommen, “An effective algorithm for string correction using generalized edit distances. I. Description of the algorithm and its optimality,” Inform. Sci.,23, No. 2, 123–142 (1981).

  78. 78.

    R. L. Kashyap and B. J. Oommen, “An effective algorithm for string correction using generalized edit distances. II. Computational complexity of the algorithm and some applications,” Inform. Sci.,23, No. 3, 210–217 (1981).

  79. 79.

    R. L. Kashyap and B. J. Oommen, “Pattern matching with noisy substrings,” Proc. COMPSAC 81, IEEE Comp. Software and Appl. Conf., Nov. 18–20 (1981), pp. 119–125.

  80. 80.

    R. L. Kashyap and B. J. Oommen, “A common basis for similarity measures involving two strings,” Int. J. Comp. Math.,13, 17–40 (1983).

  81. 81.

    R. L. Kashyap and B. J. Oommen, “Spelling correction using probabilistic methods,” Patter Recogn. Lett.,2, No. 3, 147–154 (1984).

  82. 82.

    A. Kawai and others, “Sentence structure standardization method for detection of errors in English sentences,” Syst., Comput., Contr.,14, No. 2, 84–92 (1983).

  83. 83.

    D. Klarner, “Sets of words which omit specified words and subwords,” Proc. Kon. Ned. Akad. Wetensch.,A81, No. 2 (1978).

  84. 84.

    J. Krause, “Natural language access to information systems: an evaluation study of its acceptance by end-users,” Inform. Syst.,5, No. 4, 297–318 (1980).

  85. 85.

    K. Lowrance and R. A. Wagner, “An extension of the string-to-string correction problem,” J. ACM,22, No. 3, 177–183 (1975).

  86. 86.

    M. Maguire, “Computer recognition of textual keyboard inputs for naive users,” Behavior Inform. Techn.,1, No. 2, 93–111 (1982).

  87. 87.

    J. Martin, Computer Data Base Organization, Prentice-Hall, Englewood Cliffs, N.J. (1975).

  88. 88.

    N. Meirowitz and A. Van Dam, “Interactive editing systems,” Comp. Surveys,14, No. 3, 321–416 (1982).

  89. 89.

    G. A. Miller and others, “Length-frequency statistics for written English texts,” Inform. Contr.,1, No. 4, 370–389 (1958).

  90. 90.

    G. A. Miller and E. A. Friedman, “The reconstruction of mutilated English texts,” Inform. Contr.,1, No. 1, 38–55 (1957).

  91. 91.

    L. A. Milles and others, “Text critiquing with EPISTLE system: an author's aid to better syntax,” AFIPS Conf. Proc. 50 (May 1981), pp. 649–655.

  92. 92.

    R. Mitton, “Spelling checkers, spelling correctors, and misspelling of poor spellers,” Inform. Process. Manag,23, No. 5, 495–505 (1987).

  93. 93.

    M. Mor and A. S. Fraenkel, “Retrieval in an environment of faulty texts or faulty queries,” Improving Database Usability and Responsibility, 2nd Int. Conf. on Databases, New York (1982), pp. 405–425.

  94. 94.

    H. L. Morgan, “Spelling correction in system programs,” Commun. ACM,13, No. 2, 90–94 (1970).

  95. 95.

    R. Morris and L. L. Cherry, “Computer detection of typographic errors,” IEEE Trans. Prof. Commun.,M18, 54–63 (1985).

  96. 96.

    S. R. Mukherjee and M. Sloan, “Positional representation of English words,” IEEE Trans. Prof. Commun.,38, 587–591 (1985).

  97. 97.

    F. E. Muth and A. L. Tharp, “Correction of human errors in alphanumeric terminal input,” Inform. Process. Manag.,13, No. 6, 329–337 (1977).

  98. 98.

    R. Nix, “Experience with a space efficient way to store a dictionary,” Commun. ACM,24, No. 5, 297–298 (1981).

  99. 99.

    J. L. Peterson, “Computer programs for detection and correction of spelling errors,” Commun. ACM,23, No. 12, 676–687 (1980).

  100. 100.

    J. L. Peterson, Computer Programs for Spelling Correction: An Experiment in Program Design, Springer, Berlin (1980).

  101. 101.

    J. L. Peterson, “A note on undetected typing errors,” Commun. ACM,29, No. 7, 633–637 (1986).

  102. 102.

    J. J. Pollock, “Spelling error detection and correction by computer: some notes and bibliography,” J. Document.,38, No. 4, 282–291 (1982).

  103. 103.

    J. J. Pollock and A. Zamora, “Collection and characterization of spelling errors in scientific and scholarly texts,” J. Am. Soc. Inform. Sci.,34, No. 1, 51–58 (1983).

  104. 104.

    J. J. Pollock and A. Zamora, “Automatic spelling correction in scientific and scholarly texts,” Commun. ACM,27, No. 4, 358–368 (1984).

  105. 105.

    J. J. Pollock and A Zamora, “System design for detection and correction of spelling errors in scientific and sholarly texts,” J. Am. Soc. Inform. Sci.,35, No. 2, 104–109 (1984).

  106. 106.

    R. L. Prescott, “On spelling error detection,” Commun. ACM,24, No. 5, 331–332 (1981).

  107. 107.

    E. M. Rieseman and A. R. Hanson, “A contextual postprocessing system for error correction using binary n-grams,” IEEE Trans. Comput.,23, No. 5, 480–493 (1974).

  108. 108.

    P. Robinson and D. Singer, “Another spelling correction program,” Commun. ACM,24, No. 5, 296–297 (1981).

  109. 109.

    H. J. Schek, “Tolerating fuzziness in key-words by similarity searches,” Kybernetes,6, 175–184 (1977).

  110. 110.

    K. H. Sellers, “An algorithm for the distance between two finite sequences,” J. Combin. Theory (A),16, No. 2, 253–258 (1974).

  111. 111.

    M. Shapiro, “The choice of reference points in best-match file searching,” Commun. ACM,20, No. 5, 339–343 (1977).

  112. 112.

    R. Shinghal and others, “A simplified heuristic version of recursive Bayes algorithm for using context in text recognition,” IEEE Trans. Syst. Man and Cybernet.,8, No. 5, 412–414 (1978).

  113. 113.

    G. T. Tonssaint, “A bottom-up and top-down approach in using context in text recognition,” Int. J. Man-Machine Studies,11, No. 2, 201–202 (1979).

  114. 114.

    G. T. Tonssaint, “Experiments in text recognition with modified Viterbi algorithm,” IEEE Trans. Pattern Anal. Machine Intell.,1, 184–192 (1979).

  115. 115.

    G. de V. Smit, “A comparison of three string matching algorithms,” Software Pract. Exper.,12, 57–66 (1982).

  116. 116.

    K. Subieta, “A simple method of data correction,” Prace IPI PAN, No. 527 (1983).

  117. 117.

    A. J. Szanser, “Automatic error correction in natural languages,” Inform. Stor. Retriev.,5, No. 4, 167–174 (1970).

  118. 118.

    A. J. Szanser, “Bracketing technique in elastic matching,” Comp. J.,16, No. 2, 132–134 (1973).

  119. 119.

    G. R. Szelenyi, “Formale und Semantische Prufmoglichkeiten in Textverarbeitung,” Office Manag., No. 1, 30–33 (1983).

  120. 120.

    T. Tagliacozzo and others, “Orthographic error patterns of author names in catalog searches,” J. Lib. Autom.,3, No. 2, 93–101 (1970).

  121. 121.

    E. Tanaka and T. Kasai, “Correcting method of garbled languages using ordered key letters,” Electr. and Comm. in Japan, No. 6, 127–133 (1972).

  122. 122.

    A. L. Tarp and Kue-chang Tai, “The practicality of text signatures for accelerating string searching,” Software Pract. Exper.,12, 35–44 (1982).

  123. 123.

    L. E. Thorelli, “Automatic correction of errors in text,” Bit,2, 45–65 (1962).

  124. 124.

    T. N. Turba, “Checking for spelling and typographical errors in computer-based text,” ACM Sigplan Not.,16, No. 6, 51–60 (1981).

  125. 125.

    J. R. Ullman, “A binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words,” Comp. J.,20, No. 2, 141–147 (1977).

  126. 126.

    F. L. Van Nes, “Analysis of keying errors,” Ergonomics,19, No. 2, 165–174 (1976).

  127. 127.

    R. A. Wagner and M. J. Fisher, “The string-to-string correction problem,” J. ACM,23, No. 1, 13–16 (1976).

  128. 128.

    C. K. Wong and A. K. Chandra, “Bounds for the string editing problem,” J. ACM,23, No. 1, 13–16 (1976).

  129. 129.

    E. J. Yannakoudakis, “Expert spelling error analysis and correction,” Inf 7: Intelligent Information Retrieval, Proc. Conf., ASLIB Inform. Group and Inform. Retriev., London (1983), pp. 39–50.

  130. 130.

    E. J. Yannakoudakis and D. Fawthrop, “The rules of spelling errors,” Inform Process. Manag.,19, No. 2, 87–99 (1983).

  131. 131.

    E. J. Yannakoudakis and D. Fawthrop, “An intelligent spelling error corrector,” Inform. Process. Manag.,19, No. 2, 101–108 (1983).

  132. 132.

    P. N. Yianilos, “A dedicated comparator matches symbol strings fast and intelligently,” Electronics,56, No. 24, 113–117 (1983).

  133. 133.

    A Zamora, “Automatic detection and correction of spelling errors in large data base,” J. Am. Soc. Inform. Sci.,31, No. 2, 51–57 (1980).

  134. 134.

    E. M. Zamora, J. J. Pollock, and A. Zamora, “The use of trigram analysis for spelling error detection,” Inform. Process. Manag.,17, No. 6, 305–316.

Download references

Additional information

Translated from Itogi Nauki i Tekhniki, Seriya Teoriya Veroyatnostei, Matematicheskaya Statistika, Teoreticheskaya Kibernetika, Vol. 28, pp. 111–139, 1988.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bol'shakov, I.A. Automatic error correction in inflected languages. J Math Sci 56, 2263–2279 (1991). https://doi.org/10.1007/BF01099203

Download citation

Keywords

  • Natural Language
  • Error Correction
  • Morphological Analysis
  • Main Method
  • Automatic Detection