Abstract
This paper presents implementations of generative management method for morphological variation of query keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that either have fair amount of morphological variation or are morphologically very rich. The paper reports implementation and evaluation of automatic procedures of variant query keyword form generation with short and long queries of CLEF collections for English, Finnish, German and Swedish. The evaluated languages show varying degrees of morphological complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sparck-Jones, K., Tait, J.I.: Automatic Search Term Variant Generation. Journal of Documentation 40, 50–66 (1984)
Kettunen, K.: Reductive and Generative Approaches to Morphological Variation of Keywords in Monolingual Information Retrieval. Acta Universitatis Tamperensis 1261. University of Tampere, Tampere (2007)
Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval. Data Structures and Algorithms, pp. 131–160. Prentice Hall, Upper Saddle River (1992)
Kettunen, K., Airio, E.: Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval? In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 411–422. Springer, Heidelberg (2006)
Kettunen, K., Airio, E., Järvelin, K.: Restricted Inflectional Form Generation in Management of Morphological Keyword Variation. Information Retrieval 10, 415–444 (2007)
Sormunen, E.: A Method for Measuring Wide Range Performance of Boolean Queries in Full-text Databases. Acta Universitatis Tamperensis 748. University of Tampere, Tampere (2000)
Savoy, J.: Searching Strategies for the Bulgarian Language. Information Retrieval 10, 509–529 (2007)
The Lemur Toolkit for Language Modeling and Information Retrieval, http://www.lemurproject.org/
Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval 40, 735–750 (2004)
Grossman, D.A., Frieder, O.: Information Retrieval. Algorithms and Heuristics, 2nd edn. Springer, Netherlands (2004)
Minnen, G., Carrol, J., Pearce, D.: Applied Morphological Processing of English. Natural Language Engineering 7, 207–223 (2001)
Knutsson, O., Pargman, T.C., Eklundh, K.S., Westlund, S.: Designing and Developing a Language Environment for Second Language Writers. Computers and Education, An International Journal 49 (2001)
Brown Corpus Manual, http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM
TDT2 Multilanguage Text Version 4.0, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001T57
Airio, E.: Word normalization and decompounding in mono- and bilingual IR. Information Retrieval 9, 249–271 (2006)
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Information Retrieval 7, 33–52 (2004)
Snowball, http://snowball.tartarus.org/
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)
Robertson, S.: Salton Award Lecture. On Theoretical Argument in Information Retrieval. ACM Sigir Forum 34, 1–10 (2000)
Rasmussen, E.M.: Indexing and Retrieval for the Web. In: Cronin, B. (ed.) Annual Review of Information Science and Technology, vol. 37, pp. 91–124 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kettunen, K. (2008). Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)