Advertisement

Recursive Decompounding in Afrikaans

  • Tilla Fick
  • Chris Swanepoel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6836)

Abstract

An algorithm has been developed to decompose compound words in Afrikaans. This data driven technique recursively uses an extensive list of Afrikaans words in the decompounding process. String fitting from the beginning and end of words forms the basis of the process, while sublists containing short words that may occur only at the beginning or end of words, and lists of prefixes and suffixes are utilised. Applying the algorithm to the original lexicon of 182 433 words resulted in accuracy of 90,2%, precision of 99,9% and recall of 83,6%.

Keywords

Reference List Machine Translation Compound Word Short Word Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adda-Decker, M., Adda, G., Lamel, L.: Investigating text normalization and pronunciation variants for German broadcast transcription. In: ICSLP, pp. 266–269 (2000)Google Scholar
  2. 2.
    Alfonseca, E., Bilac, S., Pharies, S.: Decompounding query keywords from compounding languages. In: ACL 2008: HLT, pp. 253–256 (2008)Google Scholar
  3. 3.
    Alfonseca, E., Bilac, S., Pharies, S.: German Decompounding in a Difficult Corpus. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 128–139. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Brown, R.D.: Corpus-driven splitting of compound words. In: TMI-2002, pp. 616–624. ACL (2002)Google Scholar
  5. 5.
    Fick, M., Swanepoel, C.J.: Afrikaanse Lettergreepverdelingspatrone. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie (2010)Google Scholar
  6. 6.
    Fritzinger, F., Fraser, A.: How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing. In: MATR, pp. 224–234. ACL (2010)Google Scholar
  7. 7.
    Koehn, P., Arun, A., Hoang, H.: Towards better Machine Translation Quality for German–English Language Pairs. In: Third Workshop on Statistical Machine Translation, pp. 139–142. ACL (2008)Google Scholar
  8. 8.
    Koehn, P., Knight, K.: Empirical methods for compound splitting. In: EACL, 187–193. ACL (2003)Google Scholar
  9. 9.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  10. 10.
    Monz, C., De Rijke, M.: Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian. In: Peters, C.A., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Pilon, S., Puttkammer, M.J., Van Huyssteen, G.B.: The development of a hyphenator and compound analyser for Afrikaans. Literator (2008)Google Scholar
  12. 12.
    Popović, M., Stein, D., Ney, H.: Statistical machine translation of German compound words. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 616–624. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Schiller, A.: German compound analysis with wfsc. In: Finite State Methods and NLP (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tilla Fick
    • 1
  • Chris Swanepoel
    • 1
  1. 1.Department of Decision SciencesUniversity of South AfricaPretoriaSouth Africa

Personalised recommendations