On Compact Storage Models for Gazetteers

  • Jakub Piskorski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)

Abstract

This paper describes compact storage models for gazetteers using state-of-the-art finite-state technology. In particular, we compare the standard method based on numbered indexing automata associated with an auxiliary storage device, against a pure finite-state representation, the latter being superior in terms of space and time complexity, when applied to real-world test data. Further, we pinpoint some pros and cons for both approaches and provide results of empirical experiments, which form handy guidelines for selecting a suitable data structure for implementing a gazetteer.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ciura, M.G., Deorowicz, S.: How to Squeeze a Lexicon. Software - Practice and Experience 31(11), 1077–1090 (2001)CrossRefMATHGoogle Scholar
  2. 2.
    Daciuk, J.: Incremental Construction of Finite-State Automata and Transducers. PhD Thesis. Technical University Gdańsk (1998)Google Scholar
  3. 3.
    Kowaltowski, T., Lucchesi, C.L.: Applications of Finite Automata Representing Large Vocabularies. TR DCC-01/92, University of Campinas, Brazil (1992)Google Scholar
  4. 4.
    Kowaltowski, T., Lucchesi, C.L., Stolfi, J.: Finite Automata and Efficient Lexicon Implementation. TR IC-98-02, University of Campinas, Brazil (1998)Google Scholar
  5. 5.
    Beijer, N.D., Watson, B.W., Kourie, D.G.: Stretching and Jamming of Automata. In: Proceedings of SAICSIT 2003, Rep. South Africa, pp. 198–207 (2003)Google Scholar
  6. 6.
    Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures — Foundations and Applications. Künstliche Intelligenz 2004(1), 17–23 (2004)Google Scholar
  7. 7.
    Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental Construction of Minimal Acyclic Finite State Automata. Comp. Rep Linguistics 26(1), 3–16 (2000)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Daciuk, J., van Noord, G.: Finite Automata for Compact Representation of Language Models in NLP. Theoretical Computer Science 313(1) (2004)Google Scholar
  9. 9.
    Graña, J., Barcala, F.M., Alonso, M.A.: Compilation Methods of Minimal Acyclic Automata for Large Dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Vo, B., Vo, K.P.: Using Column Dependency to Compress Tables. In: Proceedings of the 2004 IEEE Data Compression Conference, pp. 92–101. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  11. 11.
    Daciuk, J.: Experiments with Automata Compression. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 113–119. Springer, Heidelberg (2000)Google Scholar
  12. 12.
    Mihov, S., Maurel, D.: Direct Construction of Minimal Acyclic Subsequential Transducers. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 217–229. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  13. 13.
    Skut, W.: Incremental Construction of Minimal Acyclic Sequential Transducers from Unsorted Lexical Data. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jakub Piskorski
    • 1
  1. 1.DFKI GmbH, German Research Center for Artificial IntelligenceSaarbrückenGermany

Personalised recommendations