HFST—Framework for Compiling and Applying Morphologies

  • Krister Lindén
  • Erik Axelson
  • Sam Hardwick
  • Tommi A. Pirinen
  • Miikka Silfverberg
Part of the Communications in Computer and Information Science book series (CCIS, volume 100)

Abstract

HFST–Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.

Keywords

Finite-state libraries finite-state morphology natural language applications 

References

  1. 1.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, & Tools with Gradiance, 2nd edn. Addison-Wesley Publishing Company, Reading (2007)MATHGoogle Scholar
  2. 2.
    Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: A general and efficient weighted finite-state transducer library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007), http://www.openfst.org CrossRefGoogle Scholar
  3. 3.
    Apache Software Foundation: Apache License, Version 2.0, http://www.apache.org/licenses/LICENSE-2.0.html
  4. 4.
    Beesley, K.R.: Constraining separated morphotactic dependencies in finite-state grammars. In: Karttunen, L., Oflazer, K. (eds.) Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, pp. 118–127. Association for Computational Linguistics, Morristown (1998)Google Scholar
  5. 5.
    Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications, Stanford (2003)Google Scholar
  6. 6.
    Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11, 481–494 (1964)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Free Software Foundation: GNU Lesser General Public License, Version 3, http://www.gnu.org/licenses/lgpl.html
  8. 8.
    Garrido-Alenda, A., Forcada, M.L., Carrasco, R.C.: Incremental construction and maintenance of morphological analysers based on augmented letter transducers (2002)Google Scholar
  9. 9.
    Hopcroft, J.E.: An n log n algorithm for minimizing states in a finite automaton. Tech. rep., Stanford University, Stanford, CA, USA (1971)Google Scholar
  10. 10.
    Huldén, M.: Fast approximate string matching with finite automata. Procesamiento del Lenguaje Natural 43, 57–64 (2009)Google Scholar
  11. 11.
    Karttunen, L.: Constructing lexical transducers. In: The Proceedings of the 15th International Conference on Computational Linguistics, Coling 1994, pp. 406–411. ACL, Morristown (1994)Google Scholar
  12. 12.
    Liang, F.M.: Word hyphenation by computer. Ph.D. thesis, Stanford University (1983), http://www.tug.org/docs/liang/
  13. 13.
    Lindén, K., Silfverberg, M., Pirinen, T.: HFST tools for morphology—an efficient open-source package for construction of morphological analyzers. In: Mahlow, Piotrowski (eds.) [14], pp. 28–47Google Scholar
  14. 14.
    Mahlow, C., Piotrowski, M. (eds.): SFCM 2009. CCIS, vol. 41. Springer, Heidelberg (2009)MATHGoogle Scholar
  15. 15.
    Proceedings of the 18th Nordic Conference of Computational Linguistics, Nodalida 2011, Riga, May 11-13 (2011)Google Scholar
  16. 16.
    Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)Google Scholar
  17. 17.
    Pirinen, T.: Suomen kielen äärellistilainen automaattinen morfologinen analyysi avoimen lähdekoodin menetelmin. Master’s thesis, Helsingin yliopisto (2008), http://www.helsinki.fi/~tapirine/gradu/
  18. 18.
    Pirinen, T.: Modularisation of Finnish finite-state language description–towards wide collaboration in open source development of a morphological analyser. In: Nodalida (ed.) [15],Google Scholar
  19. 19.
    Pirinen, T.A., Lindén, K.: Building and using existing hunspell dictionaries and TeX hyphenators as finite-state automata. In: Proccedings of Computational Linguistics – Applications, Wisła, Poland, pp. 25–32 (2010), http://www.helsinki.fi/~tapirine/publications/Pirinen-cla-2010.pdf
  20. 20.
    Pirinen, T.A., Lindén, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-resourced Languages, Valletta, Malta, pp. 13–18 (2010)Google Scholar
  21. 21.
    Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 251–260. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  22. 22.
    Schmid, H.: A programming language for finite state transducers. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS (LNAI), vol. 4002, pp. 308–309. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Silfverberg, M., Lindén, K.: Conflict resolution using weighted rules in HFST-TWOLC. In: Proceedings of the 17th Nordic Conference of Computational Linguistics, Nodalida 2009, Nealt, pp. 174–181 (2009)Google Scholar
  24. 24.
    Silfverberg, M., Lindén, K.: HFST runtime format—a compacted transducer format allowing for fast lookup. In: Watson, B., Courie, D., Cleophas, L., Rautenbach, P. (eds.) FSMNLP (July 13, 2009), http://www.ling.helsinki.fi/~klinden/pubs/fsmnlp2009runtime.pdf
  25. 25.
    Silfverberg, M., Lindén, K.: Part-of-speech tagging using parallel weighted finite-state transducers. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 369–380. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  26. 26.
    Silfverberg, M., Lindén, K.: Combining statistical models for POS tagging using finite-state calculus. In: Nodalida (ed.) [15]Google Scholar
  27. 27.
    Zielinski, A., Simon, C.: Morphisto: Service-oriented open source morphology for German. In: Mahlow, Piotrowski (eds.) [14], pp. 64–75.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Krister Lindén
    • 1
  • Erik Axelson
    • 1
  • Sam Hardwick
    • 1
  • Tommi A. Pirinen
    • 1
  • Miikka Silfverberg
    • 1
  1. 1.Department of Modern LanguagesUniversity of HelsinkiHelsinginFinland

Personalised recommendations