Morpho Challenge Evaluation by Information Retrieval Experiments
In Morpho Challenge competitions, the objective has been to design statistical machine learning algorithms that discover which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval (IR), and statistical language modeling. In this paper, we propose to evaluate the morpheme analyses by performing IR experiments, where the words in the documents and queries are replaced by their proposed morpheme representations and the search is based on morphemes instead of words. In this paper, the evaluations are run for three languages: Finnish, German, and English using the queries, texts, and relevance judgments available in CLEF forum. The results show that the morpheme analysis has a significant effect in IR performance in all languages, and that the performance of the best unsupervised methods can be superior to the supervised reference methods.
KeywordsMorphological analysis Machine learning
Unable to display preview. Download preview PDF.
- 1.Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the Human Language Technology, Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 4–6 (2003)Google Scholar
- 2.Lee, Y.S.: Morphological analysis for statistical machine translation. In: Proceedings of the Human Language Technology, Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), Boston, MA, USA (2004)Google Scholar
- 3.Zieman, Y., Bleich, H.: Conceptual mapping of user’s queries to medical subject headings. In: Proceedings of the 1997 American Medical Informatics Association (AMIA) Annual Fall Symposium (October 1997)Google Scholar
- 5.Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), Espoo, Finland, pp. 106–113 (2005)Google Scholar
- 6.Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the Workshop on Morphological and Phonological Learning of ACL 2002, pp. 21–30 (2002)Google Scholar
- 7.Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor. Technical Report A81, Publications in Computer and Information Science, Helsinki University of Technology (2005), http://www.cis.hut.fi/projects/morpho/
- 8.Tepper, M.: A Hybrid Approach to the Induction of Underlying Morphology. PhD thesis, University of Washington (2007)Google Scholar
- 9.Creutz, M., Linden, K.: Morpheme segmentation gold standards for finnish and english. Technical Report A77, Publications in Computer and Information Science, Helsinki University of Technology (2004), http://www.cis.hut.fi/projects/morpho/
- 10.Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
- 11.Kurimo, M., Creutz, M., Varjokallio, M., Arisoy, E., Saraclar, M.: Unsupervised segmentation of words into morphemes - Challenge 2005, an introduction and evaluation report. In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, Venice, Italy (2006)Google Scholar
- 12.Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the Third Text Retrieval Conference (TREC-3), pp. 109–126 (1994)Google Scholar
- 13.Kurimo, M., Creutz, M., Turunen, V.: Unsupervised morpheme analysis evaluation by IR experiments – Morpho Challenge 2007. In: Working Notes for the CLEF 2007 Workshop, Budapest, Hungary (2007)Google Scholar