Abstract
Our goal in participating in FIRE 2011 evaluation campaign is to analyse and evaluate the retrieval effectiveness of our implemented retrieval system when using Marathi language. We have developed a light and an aggressive stemmer for this language as well as a stopword list. In our experiment seven different IR models (language model, DFR-PL2, DFR-PB2, DFR-GL2, DFR-I(n e)C2, tf idf and Okapi) were used to evaluate the influence of these stemmers as well as n-grams and trunc-n language-independent indexing strategies, on retrieval performance. We also applied a pseudo relevance-feedback or blind-query expansion approach to estimate the impact of this approach on enhancing the retrieval effectiveness. Our results show that for Marathi language DFR-I(n e)C2, DFR-PL2 and Okapi IR models result the best performance. For this language trunc-n indexing strategy gives the best retrieval effectiveness comparing to other stemming and indexing approaches. Also the adopted pseudo-relevance feedback approach tends to enhance the retrieval effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dolamic, L., Savoy, J.: UniNE at FIRE 2008: Hindi, Marathi and Bengali IR. FIRE 2008 Working Notes (2008)
Savoy, J.: Combining Multiple Strategies for Effective Monolingual and Cross-Lingual Retrieval. IR Journal 7, 121–148 (2004)
Savoy, J.: Comparative Study of Monolingual and Multilingual Search Models for Use with Asian Languages. ACM - Transactions on Asian Languages Information Processing 4, 163–189 (2005)
Savoy, J.: Searching Strategies for the Hungarian Language. Information Processing & Management 44(1), 310–324 (2008)
Koskenniemi, K., Church, K.W.: Complexity Two-Level Morphology and Finnish. In: Proceedings COLING, Budapest, pp. 1–9 (1988)
Voorhees, E.M., Harman, D.K. (eds.): TREC. Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge (2005)
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a Way of Life: Okapi at TREC. Information Processing & Management 36, 95–108 (2002)
Amati, G., van Rijsbergen, C.J.: Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems 20, 357–389 (2002)
Hiemstra, D.: Using Language Models for Information Retrieval. Ph.D. Thesis (2000)
Hiemstra, D.: Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval. In: Proceedings of the ACM-SIGIR, pp. 35–41. The ACM Press (2002)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22, 179–214 (2004)
Fox, C.: A Stop List for General Text. ACM-SIGIR Forum 24, 19–35 (1990)
McNamee, P., Mayfield, J.: Character N-gram Tokenization for European Language Text Retrieval. IR Journal 7, 73–97 (2004)
Savoy, J.: Light Stemming Approaches for the French, Portuguese, German and Hungarian Languages. In: Proceedings of the ACM-SAC, pp. 1031–1035. The ACM Press (2006)
Harman, D.K.: How Effective is Suffxing? Journal of the American Society for Information Science 42, 7–15 (1991)
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)
Fautsch, C., Savoy, J.: Algorithmic Stemmers or Morphological Analysis: An Evaluation. Journal of the American Society for Information Sciences and Technology 60, 1616–1624 (2009)
Buckley, C., Voorhees, E.M.: Retrieval System Evaluation. In: Voorhees, E.M., Harman, D.K. (eds.) TREC. Experiment and Evaluation in Information Retrieval, pp. 53–75. The MIT Press, Cambridge (2005)
Dolamic, L., Savoy, J.: When Stopword Lists Make the Difference. Journal of the American Society for Information Sciences and Technology 61, 200–203 (2010)
Buckley, C., Singhal, A., Mitra, M., Salton, G.: New Retrieval Approaches Using SMART. In: Proceedings of the TREC-4, pp. 25–48. NIST Publication #500-236, Gaithersburg (1996)
Peat, H.J., Willett, P.: The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems. Journal of the American Society for Information Science 42, 378–383 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akasereh, M., Savoy, J. (2013). Ad Hoc Retrieval with Marathi Language. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-40087-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40086-5
Online ISBN: 978-3-642-40087-2
eBook Packages: Computer ScienceComputer Science (R0)