On Compact Storage Models for Gazetteers

  • Jakub Piskorski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)


This paper describes compact storage models for gazetteers using state-of-the-art finite-state technology. In particular, we compare the standard method based on numbered indexing automata associated with an auxiliary storage device, against a pure finite-state representation, the latter being superior in terms of space and time complexity, when applied to real-world test data. Further, we pinpoint some pros and cons for both approaches and provide results of empirical experiments, which form handy guidelines for selecting a suitable data structure for implementing a gazetteer.


Outgoing Transition Physical Storage Handy Guideline Incremental Construction Sequential Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ciura, M.G., Deorowicz, S.: How to Squeeze a Lexicon. Software - Practice and Experience 31(11), 1077–1090 (2001)CrossRefzbMATHGoogle Scholar
  2. 2.
    Daciuk, J.: Incremental Construction of Finite-State Automata and Transducers. PhD Thesis. Technical University Gdańsk (1998)Google Scholar
  3. 3.
    Kowaltowski, T., Lucchesi, C.L.: Applications of Finite Automata Representing Large Vocabularies. TR DCC-01/92, University of Campinas, Brazil (1992)Google Scholar
  4. 4.
    Kowaltowski, T., Lucchesi, C.L., Stolfi, J.: Finite Automata and Efficient Lexicon Implementation. TR IC-98-02, University of Campinas, Brazil (1998)Google Scholar
  5. 5.
    Beijer, N.D., Watson, B.W., Kourie, D.G.: Stretching and Jamming of Automata. In: Proceedings of SAICSIT 2003, Rep. South Africa, pp. 198–207 (2003)Google Scholar
  6. 6.
    Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures — Foundations and Applications. Künstliche Intelligenz 2004(1), 17–23 (2004)Google Scholar
  7. 7.
    Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental Construction of Minimal Acyclic Finite State Automata. Comp. Rep Linguistics 26(1), 3–16 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Daciuk, J., van Noord, G.: Finite Automata for Compact Representation of Language Models in NLP. Theoretical Computer Science 313(1) (2004)Google Scholar
  9. 9.
    Graña, J., Barcala, F.M., Alonso, M.A.: Compilation Methods of Minimal Acyclic Automata for Large Dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Vo, B., Vo, K.P.: Using Column Dependency to Compress Tables. In: Proceedings of the 2004 IEEE Data Compression Conference, pp. 92–101. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  11. 11.
    Daciuk, J.: Experiments with Automata Compression. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 113–119. Springer, Heidelberg (2000)Google Scholar
  12. 12.
    Mihov, S., Maurel, D.: Direct Construction of Minimal Acyclic Subsequential Transducers. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 217–229. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  13. 13.
    Skut, W.: Incremental Construction of Minimal Acyclic Sequential Transducers from Unsorted Lexical Data. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jakub Piskorski
    • 1
  1. 1.DFKI GmbH, German Research Center for Artificial IntelligenceSaarbrückenGermany

Personalised recommendations