Advertisement

Typographical Nearest-Neighbor Search in a Finite-State Lexicon and Its Application to Spelling Correction

  • Agata Savary
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2494)

Abstract

A method of error-tolerant lookup in a finite-state lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are within a given similarity threshold, our approach is to retain only the most similar corrections (nearest neighbours), reducing dynamically the search space in the lexicon, and to reach the first correction as soon as possible.

Keywords

Edit Distance Error Distance Editing Operation Computational Linguistics Input Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Daciuk, J.: Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing. Ph.D. Thesis, Politechnika Gdanska, Gdansk (1988)Google Scholar
  2. [2]
    Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental Construction of Minimal Acyclic Finite State Automata. Computational Linguistics vol. 26(1). MIT Press, Massachusetts (2000) 3–16Google Scholar
  3. [3]
    Damerau, F. J.: A Technique for Computer Detection and Correction of Spelling Errors. Communications of the ACM, Vol. 7(3) (1964) 171–176CrossRefGoogle Scholar
  4. [4]
    Du, M. W., Chang, S. C.: A model and a fast algorithm for multiple errors spelling correction. Acta Informatica, Vol. 29. Springer Verlag (1992) 281–302MATHCrossRefMathSciNetGoogle Scholar
  5. [5]
    Golding, A., Schabes, Y.: Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction. Proceedings, 34th Annual Meeting of the Association for Computational Linguistics (ACL), Santa Cruz. Association for Computational Linguistics (1996) 71–78Google Scholar
  6. [6]
    Hall, P., Dowling, G.: Approximate String Matching. ACM Computing Surveys, Vol. 12(4). ACM, New York. (1980) 381–402Google Scholar
  7. [7]
    Kaplan, R., Kay, M.: Regular Models of Phonological Rule Systems. Computational Linguistics, Vol. 20(3). Cambridge, Massachusetts, MIT Press (1994)Google Scholar
  8. [8]
    Kornai, A. (ed.): Extended Finite State Models of Language. Cambridge University Press, Cambridge, UK-New York, USA-Melbourne, Australia (1999)MATHGoogle Scholar
  9. [9]
    Kukich, K.: Techniques for Automatically Correcting Words in Text. ACM Computing Surveys, Vol. 24(4) (1992)Google Scholar
  10. [10]
    Laporte, E., Silberztein, M.: Vérification et correction orthographiques assistées par ordinateur, Actes de la Convention IA 89 (1989)Google Scholar
  11. [11]
    Lowrance, R., Wagner, R. A.: An Extension of the String-to-String Correction Problem. Journal of the ACM, Vol. 22(2) (1975) 177–183MATHCrossRefMathSciNetGoogle Scholar
  12. [12]
    McIlroy, M. D.: Development of a Spelling List. IEEE Transactions on Communications, COM-30(1) (1982) 91–99CrossRefGoogle Scholar
  13. [13]
    Mohri, M.: Minimization of sequential transducers. Lecture Notes in Computer Science, Vol. 807. Springer Verlag. Berlin. (1994)Google Scholar
  14. [14]
    Oflazer, K.: Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, Vol. 22(1). MIT Press, Cambridge, Massachusetts (1996) 73–89Google Scholar
  15. [15]
    Ren, X., Perrault, F.: The Typology of Unknown Words: An Experimental Study of Two Corpora. Proceedings, 15th International Conference on Computational Linguistics (COLING), Nantes. International Committee on Computational Linguistics (1992) 408–414Google Scholar
  16. [16]
    Roche. E., Schabes, Y. (eds.): Finite-State Language Processing. MIT Press, Cambridge, Massachusetts (1997)Google Scholar
  17. [17]
    Véronis, J.: Morphosyntactic correction in natural language interfaces. Proceedings, 13th International Conference on Computational Linguistics (COLING), Budapest. International Committee on Computational Linguistics (1988) 708–713Google Scholar
  18. [18]
    Wagner, R.A., Fischer, M. J.: The String-to-String Correction Problem. Journal of the ACM, Vol. 21(1) (1974) 168–173MATHCrossRefMathSciNetGoogle Scholar
  19. [19]
    Watson, B.: Taxonomies and Toolkits of Regular Language Algorithms. Ph.D. Thesis, Eindhoven University of Technology, the Netherlands (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Agata Savary
    • 1
  1. 1.LADL, IGMUniversité de Marne-la-ValléeMarne-la-ValléeFrance

Personalised recommendations