JMorpher: A Finite-State Morphological Parser in Java for Android

  • Leonel F. de Alencar
  • Mardonio J. C. França
  • Katiuscia M. Andrade
  • Philipp B. Costa
  • Henrique S. Vasconcelos
  • Francinaldo P. Madeira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8775)

Abstract

This paper presents JMorpher, a morphological parsing utility that is implemented in pure Java. It is apparently the first tool of this type that natively runs on Android mobile devices. JMorpher compiles a lexical transducer definition in the AT&T raw text format, of the type generated by Foma and other open source finite-state packages, into an internal Java representation which is drawn upon to parse input strings. Besides the API, JMorpher comprises of a simple graphical interface that allows the user to load a transducer file, type in some text and parse it. Results of an evaluation based on large Portuguese lexical transducers of different complexity degrees are provided. The implementation was shown to be very efficient on a desktop PC. Although, on an Android smartphone, JMorpher’s performance is much lower, it is still suited to the needs of NLP tasks in this environment.

Keywords

NLP Finite-State Morphology Morphological Analysis Morphological Parsing Lexical Transducer Android Technology 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI, Stanford (2003)Google Scholar
  2. 2.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly, Sebastopol (2009)Google Scholar
  3. 3.
    Branco, A., Silva, J.: Evaluating Solutions for the Rapid Development of State-of-the-Art POS Taggers for Portuguese. In: Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 507–510. ELRA, Paris (2004)Google Scholar
  4. 4.
    Dale, R.: Classical Approaches to Natural Language Processing. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 3–7. Chapman & Hall/CRC, Boca Raton (2009)Google Scholar
  5. 5.
    Fradin, B.: Abbréviation des gloses morphologiques. Laboratoire de Linguistique Formelle, Paris (2013), http://www.llf.cnrs.fr/gloses-fr.php Google Scholar
  6. 6.
    Hardwick, S.: HFST: Optimized Lookup Format (2009), https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstOptimizedLookupFormat
  7. 7.
    Hippisley, A.: Lexical Analysis. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 31–58. Chapman & Hall/CRC, Boca Raton (2009)Google Scholar
  8. 8.
    Hulden, M.: Foma: A Finite-State Compiler and Library. In: EACL (Demos), pp. 29–32 (2009)Google Scholar
  9. 9.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson, London (2009)Google Scholar
  10. 10.
  11. 11.
    Michaelis Moderno Dicionário da Língua Portuguesa. Melhoramentos, São Paulo (2009), http://michaelis.uol.com.br/moderno/portugues/index.php
  12. 12.
    Lindén, K., Silfverberg, M., Pirinen, T.: HFST Tools for Morphology: An Efficient Open-Source Package for Construction of Morphological Analyzers. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2009. CCIS, vol. 41, pp. 28–47. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Muniz, M.C.M.: Projeto Unitex-PB. NILC, São Paulo (2004), http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/
  14. 14.
    Silfverberg, M., Lindén, K.: HFST Runtime Format: A Compacted Transducer Format Allowing for Fast Lookup. In: FSMNLP (2009), http://www.ling.helsinki.fi/klinden/pubs/fsmnlp2009runtime.pdf
  15. 15.
    Xerox: Linguistic Tools: Morphological Analysis. Morphology, http://open.xerox.com/Services/fst-nlp-tools/Pages/

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Leonel F. de Alencar
    • 1
  • Mardonio J. C. França
    • 1
  • Katiuscia M. Andrade
    • 1
  • Philipp B. Costa
    • 1
  • Henrique S. Vasconcelos
    • 1
  • Francinaldo P. Madeira
    • 1
  1. 1.Group of Computer Networks, Software Engineering, and Systems (GREat)Universidade Federal do CearáFortalezaBrazil

Personalised recommendations