Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus

  • Septina Dian Larasati
  • Vladislav Kuboň
  • Daniel Zeman
Conference paper

DOI: 10.1007/978-3-642-23138-4_8

Part of the Communications in Computer and Information Science book series (CCIS, volume 100)
Cite this paper as:
Larasati S.D., Kuboň V., Zeman D. (2011) Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In: Mahlow C., Piotrowski M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2011. Communications in Computer and Information Science, vol 100. Springer, Berlin, Heidelberg

Abstract

This paper describes a robust finite state morphology tool for Indonesian (MorphInd), which handles both morphological analysis and lemmatization for a given surface word form so that it is suitable for further language processing. MorphInd has wider coverage on handling Indonesian derivational and inflectional morphology compared to an existing Indonesian morphological analyzer [1], along with a more detailed tagset. MorphInd outputs the analysis in the form of segmented morphemes along with the morphological tags. The implementation was done using finite state technology by adopting the two-level morphology approach implemented in Foma. It achieved 84.6% of coverage on a preliminary stage Indonesian corpus where it mostly fails to capture the proper nouns and foreign words as expected initially.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Septina Dian Larasati
    • 1
  • Vladislav Kuboň
    • 1
  • Daniel Zeman
    • 1
  1. 1.Faculty of Mathematics and Physics,Institute of Formal and Applied LinguisticsCharles UniversityPragueCzech Republic

Personalised recommendations