Skip to main content

HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers

  • Conference paper
State of the Art in Computational Morphology (SFCM 2009)

Abstract

Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Koskenniemi, K.: Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. University of Helsinki, Department of General Linguistics (1983)

    Google Scholar 

  2. Karttunen, L., Koskenniemi, K., Kaplan, R.: A Compiler for Two-Level Phonological Rules. CSLI Publications (1987), http://www2.parc.com/istl/members/karttune/publications/archive/twolcomp.pdf

  3. Karttunen, L.: Two-Level Rule Compiler, Technical Report ISTL-92-2, Xerox Palo Alto Research Center (1992), http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/twolc-92/twolc92.html

  4. Karttunen, L.: Finite-State Lexicon Compiler. Technical Report, ISTL-NLTT2993-04-02, Xerox Palo Alto Research Center, Palo Alto, California (1993)

    Google Scholar 

  5. Karttunen, L.: Constructing Lexical Transducers. In: The Proceedings of the 15th International Conference on Computational Linguistics COLING 1994, I, pp. 406–411 (1994)

    Google Scholar 

  6. Beesley, K., Karttunen, L.: Finite State Morphology. CSLI Publications, Stanford (2003), http://www.fsmbook.com

    Google Scholar 

  7. Schmid, H.: A programming language for finite state transducers. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS (LNAI), vol. 4002, pp. 308–309. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Lombardy, S., Régis-Gianas, Y., Sakharovitch, J.: Introducing Vaucanson. Theoretical Computer Science 328, 77–96 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: A general and efficient weighted finite-state transducer library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007), http://www.openfst.org

    Chapter  Google Scholar 

  10. Yli-Jyrä, A., Koskenniemi, K.: Compiling Generalized Two-Level Rules and Grammars. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 174–185. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Pirinen, T.: Suomen kielen äärellistilainen automaattinen morfologia avoimen lähdekoodin menetelmin. Master’s thesis, Helsingin yliopisto (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lindén, K., Silfverberg, M., Pirinen, T. (2009). HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers. In: Mahlow, C., Piotrowski, M. (eds) State of the Art in Computational Morphology. SFCM 2009. Communications in Computer and Information Science, vol 41. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04131-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04131-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04130-3

  • Online ISBN: 978-3-642-04131-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics