Skip to main content

Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus

  • Conference paper

Part of the Communications in Computer and Information Science book series (CCIS,volume 100)

Abstract

This paper describes a robust finite state morphology tool for Indonesian (MorphInd), which handles both morphological analysis and lemmatization for a given surface word form so that it is suitable for further language processing. MorphInd has wider coverage on handling Indonesian derivational and inflectional morphology compared to an existing Indonesian morphological analyzer [1], along with a more detailed tagset. MorphInd outputs the analysis in the form of segmented morphemes along with the morphological tags. The implementation was done using finite state technology by adopting the two-level morphology approach implemented in Foma. It achieved 84.6% of coverage on a preliminary stage Indonesian corpus where it mostly fails to capture the proper nouns and foreign words as expected initially.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pisceldo, F., Mahendra, R., Manurung, R., Arka, I.W.: A Two-Level Morphological Analyser for Indonesian. In: Abstract Submitted to the Australasian Language Technology (ALTA) Workshop 2008, Tasmania (2008)

    Google Scholar 

  2. Siregar, N.: Pencarian Kata Berimbuhan pada Kamus Besar Bahasa Indonesia dengan menggunakan Algoritma Stemming. Undergraduate thesis, Faculty of Computer Science, University of Indonesia (1995)

    Google Scholar 

  3. Adriani, M., Jelita, A., Nazief, S.B., Tahaghoghi, M., Williams, H.: Stemming Indonesian: A Confix-Stripping Approach. ACM Transactions on Asian Language Information Processing 6(4) (2007)

    Google Scholar 

  4. Hartono, H.: Pengembangan Pengurai Morfologi untuk Bahasa Indonesia dengan Model Morfologi Dua Tingkat Berbasiskan PC-KIMMO. Undergraduate thesis, Faculty of Computer Science, University of Indonesia (2002)

    Google Scholar 

  5. Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications, Palo Alto (2003)

    Google Scholar 

  6. Hulden, M.: Foma: a finite-state compiler and library. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session, Athens, Greece, pp. 29–32 (2009)

    Google Scholar 

  7. Bahasa, P.: Kamus Besar Bahasa Indonesia Daring (2008), http://pusatbahasa.diknas.go.id/kbbi/ (last access: February 14, 2011)

  8. Darma Putra, D., Arfan, A., Manurung, R.: Building an Indonesian Wordnet. In: The Second International MALINDO Workshop (2008), http://bahasa.cs.ui.ac.id/wordnet/ (last access: February 14, 2011)

  9. Pisceldo, F., Manurung, R., Adriani, M.: Probabilistic Part-of-Speech Tagging for Bahasa Indonesia. In: The Third International MALINDO Workshop, Colocated Event ACL-IJCNLP 2009, Singapore, August 1 (2009)

    Google Scholar 

  10. Farizki Wicaksono, A., Purwarianti, A.: HMM Based Part-of-Speech Tagger for Bahasa Indonesia. In: The Fourth International MALINDO Workshop, Jakarta, Indonesia (2010)

    Google Scholar 

  11. Joice: Pengembangan lanjut pengurai struktur kalimat bahasa indonesia yang menggunakan constraint-based formalism. Undergraduate thesis, Faculty of Computer Science, University of Indonesia (2002)

    Google Scholar 

  12. Hari Gusmita, R., Manurung, R.: Some Initial Experiments with Indonesian Probabilistic Parsing. In: The Second International MALINDO Workshop (2008)

    Google Scholar 

  13. Dian Larasati, S., Manurung, R.: Towards a Semantic Analysis of Bahasa Indonesia for Question Answering. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, PACLING (2007)

    Google Scholar 

  14. Mahendra, R., Dian Larasati, S., Manurung, R.: Extending an Indonesian Semantic Analysis-based Question Answering System with Linguistic and World Knowledge Axioms. In: Proceedings of the 22nd Pacific Asia Conference on Language, Information, and Computation (PACLIC 2008), pp. 262–271 (2008)

    Google Scholar 

  15. PAN Localization, http://www.panl10n.net/english/OutputsIndonesia2.htm (last access: February 14, 2011)

  16. Prague Markup Language (PML), http://ufal.mff.cuni.cz/jazz/pml/index_en.html (last access: February 14, 2011)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Larasati, S.D., Kuboň, V., Zeman, D. (2011). Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2011. Communications in Computer and Information Science, vol 100. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23138-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23138-4_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23137-7

  • Online ISBN: 978-3-642-23138-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics