Information Retrieval

, Volume 12, Issue 3, pp 230–250

Current research issues and trends in non-English Web searching

  • Fotis Lazarinis
  • Jesús Vilares
  • John Tait
  • Efthimis N. Efthimiadis
Article

Abstract

With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the research questions and solutions proposed in these papers. Further research is proposed at the end of each section.

Keywords

Non-English retrieval Web searching Query log analysis Segmentation Indexing Stopwords Stemming Lemmatization Language identification Encoding handling 

References

  1. Ahlgren, P., & Kekäläinen, J. (2006). Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms. Information Retrieval, 9(6), 681–697. doi:10.1007/s10791-006-9009-1.CrossRefGoogle Scholar
  2. Ahmad, F., Yusoff, M., & Sembok, T. M. T. (1996). Experiments with a stemming algorithm for Malay words. Journal of the American Society for Information Science American Society for Information Science, 47(12), 909–918. doi:10.1002/(SICI)1097-4571(199612)47:12<909::AID-ASI4>3.0.CO;2-6.CrossRefGoogle Scholar
  3. Aho, A. V., Sethi, R., & Ullman, J. D. (1986). Compilers: Principles, techniques and tools. Addison-Wesley.Google Scholar
  4. Airio, E. (2006). Word normalization and decompounding in mono- and bilingual IR. Information Retrieval, 9(3), 249–271. doi:10.1007/s10791-006-0884-2.CrossRefGoogle Scholar
  5. Alemayehu, N., & Willett, P. (2003). The effectiveness of stemming for information retrieval in Amharic. Program: Electronic Library and Information Systems, 37(4), 254–259.CrossRefGoogle Scholar
  6. Al-Kharashi, I. A., & Evens, M. W. (1994). Comparing words, stems and roots as index terms in an Arabic information retrieval system. Journal of the American Society for Information Science American Society for Information Science, 45(8), 548–560. doi:10.1002/(SICI)1097-4571(199409)45:8<548::AID-ASI3>3.0.CO;2-X.CrossRefGoogle Scholar
  7. Amaral, C., Laurent, D., Martins, A., Mendes, A., & Pinto, C. (2004). Design & implementation of a semantic search engine for Portuguese. In Proceedings of the fourth conference on language resources and evaluation.Google Scholar
  8. Arampatzis, A., van der Weide, T. P., van Bommel, P., & Koster, C. H. A. (2000). Linguistically motivated information retrieval. In Encyclopedia of library and information science (Vol. 69, pp. 201–222). Marcel Dekker.Google Scholar
  9. Artemenko, O., Mandl, T., Shramko, M., & Womser-Hacker, C. (2006). Evaluation of a language identification system for mono- and multilingual text documents. In Proceedings of the 2006 ACM symposium on applied computing (pp. 859–860). ACM. doi:10.1145/1141277.1141473.
  10. Asker, L., Argaw, A., Gambäck, B., Asfeha, S. E., & Habte, L. N. (2009, this issue). Classifying amharic webnews, information retrieval Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley, ACM Press.Google Scholar
  11. Baeza-Yates, R., Dupret, G., & Velasco, J. (2007). A study of mobile search queries in japan. In E. Amitay, C. G. Murray, & J. Teevan (Eds.), Query log analysis: Social and technological challenges. A workshop at the 16th international World Wide Web conference (WWW 2007).Google Scholar
  12. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley, ACM Press.Google Scholar
  13. Bakar, Z. A., Sembok, T. M., & Yusoff, M. (2000). An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on malay texts. Journal of the American Society for Information Science American Society for Information Science, 51(8), 691–706. doi:10.1002/(SICI)1097-4571(2000)51:8<691::AID-ASI20>3.0.CO;2-U.CrossRefGoogle Scholar
  14. Barcala, F. M., Vilares, J., Alonso, M. A., Graña, J., & Vilares, M. (2002). Tokenization and proper noun recognition for information retrieval. In Proceedings of thirteen international workshop on database and expert systems applications (pp. 246–250).Google Scholar
  15. Bar-Ilan, J., & Gutman, T. (2005). How do search engines respond to some non-English queries? Journal of Information Science, 31(1), 13–28. doi:10.1177/0165551505049255.CrossRefGoogle Scholar
  16. Berendt, B., & Kralisch, A. (2009, this issue). A user-centric approach to identifying best deployment strategies for language tools: The impact of content and access language on Web user behaviour and attitudes. Information Retrieval.Google Scholar
  17. Bitirim, Y., Tonta, Y., & Sever, H. (2002). Information retrieval effectiveness of Turkish search engines. In Advances in information systems, Vol. 2457 of lecture notes in computer science (pp. 93–103).Google Scholar
  18. Blanco, R., & Barreiro, A. (2007). Static pruning of terms in inverted files. In Advances in information retrieval, Vol. 4425 of lecture notes in computer science (pp. 64–75).Google Scholar
  19. Blanco, R, & Lioma, C. (2009, this issue). Mixed monolingual homepage finding in 35 languages: The role of language script and search domain. Information Retrieval.Google Scholar
  20. Braschler, M., & Ripplinger, B. (2004). How effective is stemming and decompounding for german text retrieval? Journal of Information Retrieval, 7(3–4), 291–316. doi:10.1023/B:INRT.0000011208.60754.a1.CrossRefGoogle Scholar
  21. Brill, E., Kacmarcik, G., & Brockett, C. (2001). Automatically harvesting Katakana-English term pairs from search engine query log. In Proceedings of natural language processing pacific rim symposium (pp. 393–399).Google Scholar
  22. Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10. doi:10.1145/792550.792552.CrossRefGoogle Scholar
  23. Cavnar, W. B., & Trenkle, J. M. (1994). N-Gram-based text categorization. In 3rd annual symposium on document analysis and information retrieval (pp. 161–176). Las Vegas, Nevada, USA.Google Scholar
  24. Chau, M., Fang, X., & Yang, C. (2007). Web searching in Chinese: A study of a search engine in Hong Kong. Journal of the American Society for Information Science American Society for Information Science, 58(7), 1044–1054. doi:10.1002/asi.20592.CrossRefGoogle Scholar
  25. Chen, A., & Gey, F. (2002). Building an Arabic stemmer for information retrieval. In TREC 2002 (pp. 631–639). Gaithersburg: NIST.Google Scholar
  26. Chen, K., & Liu, S. (1992). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on computational linguistics (pp. 101–107). doi:10.3115/992066.992085.
  27. Chorós, K. (2005). Testing the effectiveness of retrieval to queries using polish words with diacritics. In AWIC 2005, Vol. 3528 of lecture notes in artificial intelligence (pp. 101–106).Google Scholar
  28. Darwish, K., & Oard, D. (2007). Adapting morphology for arabic information retrieval, In Adapting morphology for arabic information retrieval (pp. 245–262). Springer. 978-1-4020-6045-8.Google Scholar
  29. Demirci, R., Kismir, V., & Bitirim, Y. (2007). An evaluation of popular search engines on finding turkish documents. In 2nd IEEE international conference on Internet and Web applications and services (ICIW’07). doi:10.1109/ICIW.2007.15.
  30. De Vries, A. P. (2001). A poor man’s approach to CLEF. In Cross-language information retrieval and evaluation, Vol. 2069 of lecture notes in computer science (pp. 149–155).Google Scholar
  31. Di Nunzio, G. M., Ferro, N., Melucci, M., & Orio, N. (2004). Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In Comparative evaluation of multilingual information access systems, Vol. 3237 of lecture notes in computer science (pp. 220–235).Google Scholar
  32. Dunning, T. (1994). Statistical identification of language. Technical Report MCCS, 94-273. New Mexico: New Mexico State University.Google Scholar
  33. Efthimiadis, E. N. (2008). How do Greeks search the web? A query log analysis study. In Proceeding of the 2nd ACM workshop on improving non english web searching, Napa Valley, California, USA, October 30–30, 2008. iNEWS ’08 (pp. 81–84). New York, NY: ACM. doi:10.1145/1460027.1460041.
  34. Efthimiadis, E. N., Malevris, N., Kousaridas, A., Lepeniotou, A., & Loutas, N. (2008). An evaluation of how search engines respond to greek language queries. In Proceedings of the 41st annual Hawaii international conference on system sciences (HICSS 2008). doi:10.1109/HICSS.2008.52.
  35. Efthimiadis, E. N., Malevris, N., Kousaridas, A., Lepeniotou, A., & Loutas, N. (2009, this issue). Non-English Web search: An evaluation of indexing and searching the Greek Web. Information Retrieval.Google Scholar
  36. Eguchi, K., & Croft, B. (2009, this issue). Query structuring and expansion with two-stage term dependence for Japanese Web retrieval. Information Retrieval.Google Scholar
  37. Ekmekçioglu, Ç., & Willett, P. (2000). Effectiveness of stemming for Turkish text retrieval. Program, 34(2), 195–200.Google Scholar
  38. Figuerola, C. G., Gómez, R., Zazo-Rodríguez, A. F., & Alonso-Berrocal, J. L. (2001). Stemming in Spanish: A first approach to its impact on information retrieval. In Working notes for the CLEF 2001 workshop.Google Scholar
  39. Foo, S., & Li, H. (2004). Chinese word segmentation and its effect on information retrieval. Information Processing and Management, 40(1), 161–190. doi:10.1016/S0306-4573(02)00079-1.CrossRefGoogle Scholar
  40. Fox, C. (1990). A stop list for general text. ACM-SIGIR Forum, 24, 19–35. doi:10.1145/378881.378888.CrossRefGoogle Scholar
  41. Frakes, W., & Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms. Prentice Hall.Google Scholar
  42. Goldsmith, J., & Reutter, T. (1999). Automatic collection and analysis of German compounds. In F. Busa, I. Mani, & P. Saint-Dizier (Eds.), The computational treatment of nominals: Proceedings of the workshop COLING-ACL‘’98 (pp. 61–69), Montreal.Google Scholar
  43. Gonzalez, M., de Lima, V. L. S., & de Lima, J. V. (2005). Binary lexical relations for text representation in information retrieval. In Natural language processing and information systems, Vol. 3513 of lecture notes in computer science (pp. 21–31).Google Scholar
  44. Graña, J., Barcala, F. M., & Vilares, J. (2002). Formal methods of tokenization for part-of-speech tagging. In Computational linguistics and intelligent text processing, Vol. 2276 of l ecture notes in computer science (pp. 240–249).Google Scholar
  45. Graña, J., Chappelier, J. C., & Vilares, M. (2001). Integrating external dictionaries into stochastic part-of-speech taggers. In Proceedings of EuroConference recent advances in natural language processing (RANLP 2001) (pp. 122–128).Google Scholar
  46. Grefenstette, G. (1995). Comparing two language identification schemes. In 3rd international conference on the statistical analysis of textual data (JADT’95) (pp. 263–268), RomeGoogle Scholar
  47. Guzman, R., Montes-y-Gómez, M., Rosso, P., & Villaseñor-Pineda, L. (2009, this issue). Using the Spanish Web for self-training text classification tasks. Information Retrieval.Google Scholar
  48. Hammo, B. H. (2009, this issue). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval.Google Scholar
  49. Harman, D. (1991). How effective is suffixing? Journal of the American Society for Information Science American Society for Information Science, 42(1), 7–15. doi:10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P.CrossRefGoogle Scholar
  50. Hedlund, T. (2002). Compounds in dictionary-based cross-language information retrieval. Information Research, 7(2). Available at http://InformationR.net/ir/7-2/paper128.html. Accessed December 2008.
  51. Hollink, V., Kamps, J., Monz, C., & de Rijke, M. (2004). Monolingual document retrieval for European languages. Information Retrieval, 7(1–2), 33–52. doi:10.1023/B:INRT.0000009439.19151.4c.CrossRefGoogle Scholar
  52. Hull, D. (1996). Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science American Society for Information Science, 47(1), 70–84. doi:10.1002/(SICI)1097-4571(199601)47:1<70::AID-ASI7>3.0.CO;2-#.CrossRefGoogle Scholar
  53. Jansen, B., & Spink, A. (2005). An analysis of web searching by European AlltheWeb.com users. Information Processing and Management, 41(2), 361–381. doi:10.1016/S0306-4573(03)00067-0.CrossRefGoogle Scholar
  54. Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. An introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall.Google Scholar
  55. Kalamboukis, T. Z. (1995). Suffix stripping with modern Greek. Program, 29(4), 313–321.Google Scholar
  56. Lazarinis, F. (2007a). Web retrieval systems and the Greek language: Do they have an understanding? Journal of Information Science, 33(5), 622–636. doi:10.1177/0165551506076394.CrossRefGoogle Scholar
  57. Lazarinis, F. (2007b). Engineering and utilizing a stopword list in Greek Web retrieval. Journal of the American Society for Information Science and Technology, 58(11), 1645–1652. doi:10.1002/asi.20648.CrossRefGoogle Scholar
  58. Lazarinis, F. (2007c). Lemmatization and stopword elimination in Greek Web searching. In ACM EATIS 2007. ACM Digital Library. doi:10.1145/1352694.1352757.
  59. Lazarinis, F. (2007c). Evaluating the searching capabilities of e-commerce web sites in a non-English language: A Greek case study. Online Information Review, 31(6), 881–891. doi:10.1108/14684520710841829.CrossRefGoogle Scholar
  60. Lazarinis, F. (2008a). Improving concept based Web image retrieval by mixing semantically similar Greek queries. Program: Electronic Library and Information Systems, 42(1), 56–67. doi:10.1108/00330330810851591.CrossRefGoogle Scholar
  61. Lazarinis, F. (2008b). Retrieving non-Latin information in a Latin Web: The case of Greek. In Y.-F. B. Wu & M. Song (Eds.), Handbook of research on text and Web mining Ttchnologies (pp. 530–545). IDEA Publishing.Google Scholar
  62. Lazarinis, F. (2008c). Towards a model for evaluating web retrieval systems in non English queries. In C. Calero, M. A. Moraga, & M. Piattini (Eds.), Handbook of research on Web information systems quality (pp. 510–527). USA: Idea Group Inc.Google Scholar
  63. Lazarinis, F., & Efthimiadis, E. N. (2008). Measuring search engine quality in image queries in 10 non-English languages: An exploratory study. In Proceeding of the 2nd ACM workshop on improving non English Web searching, Napa Valley, California, USA, October 30–30, 2008. iNEWS ’08 (pp. 9–92). New York, NY: ACM. doi:10.1145/1460027.1460043.
  64. Lazarinis, F., Efthimiadis, E. N., Vilares, J., & Tait, J. (2008). Improving non-English Web searching (iNEWS08). In Proceedings of ACM-CIKM workshop.Google Scholar
  65. Lazarinis, F., Vilares, J., & Tait, J. (2007). Improving non-English Web searching. ACM SIGIR Forum, 41(2), 72–76. doi:10.1145/1328964.1328977.CrossRefGoogle Scholar
  66. Leturia, I., Gurrutxaga, A., Areta, N., Alegria, I., & Ezeiza, A. (2007). EusBila, a search service designed for the agglutinative nature of Basque, In F. Lazarinis, J. Vilares, & J. Tait (Eds.), Improving non-English Web searching (iNEWS07). SIGIR07 workshop (pp. 47–54).Google Scholar
  67. Lewandowski, D. (2006). Query types and search topics of German Web search engine users. Information Services & Use, 26(4), 261–270.Google Scholar
  68. Lewandowski, D. (2008a). The retrieval effectiveness of Web search engines: Considering results descriptions. The Journal of Documentation, 64. doi:10.1108/00220410810912451.
  69. Lewandowski, D. (2008b). Problems with the use of Web search engines to find results in foreign languages. Online Information Review, 32(5), 668–672. doi:10.1108/14684520810914034.CrossRefMathSciNetGoogle Scholar
  70. Lo, R. T. W., He, B., & Ounis, I. (2005). Automatically building a stopword list for an information retrieval system. In Proceedings of 5th Dutch-Belgian information retrieval workshop (DIR’05).Google Scholar
  71. Long, H., Lv, B., Zhao, T., & Liu, Y. (2007). Evaluate and compare Chinese internet search engines based on users’ experience. In Proceedings of IEEE wireless communications, networking and mobile computing conference (WiCom 2007) (pp. 6134–6137). doi:10.1109/WICOM.2007.1504.
  72. Macdonald, C., Lioma, C., & Ounis, I. (2007). Terrier takes on the non-English Web, In F. Lazarinis, J. Vilares, & J. Tait (Eds.), Improving non-English Web searching (iNEWS07). ACM SIGIR07 Workshop (pp. 21–28).Google Scholar
  73. Machill, M., Neuberger, C., Schweiger, W., & Wirth, W. (2004). Navigating the Internet: A Study of German-language search engines. European Journal of Communication, 19(3), 321–347. doi:10.1177/0267323104045258.CrossRefGoogle Scholar
  74. Makrehchi, M., & Kamel, M. S. (2008). Automatic extraction of domain-specific stopwords from labeled documents. In Advances in information retrieval, Vol. 4956 of l ecture notes in computer science (pp. 222–233).Google Scholar
  75. Mandl, T., & de la Cruz, T. (2009). International differences in web page evaluation guidelines, International Journal of Intercultural Information Management (to appear).Google Scholar
  76. Martins, B., & Silva, M. J. (2005). Language identification in web pages. In SAC ’05: Proceedings of the 2005 ACM symposium on applied computing (pp. 764–768), New York, NY: ACM Press.Google Scholar
  77. McNamee, P., & Mayfield, J. (2004). Character N-Gram tokenization for European language text retrieval. Information Retrieval, 7(1–2), 73–97. doi:10.1023/B:INRT.0000009441.78971.be.CrossRefGoogle Scholar
  78. Monz, C., & de Rijke, M. (2002). Shallow morphological analysis in monolingual retrieval for Dutch, German, and Italian. In Accessing multilingual information repositories, Vol. 2406 of lecture notes in computer science (pp. 262–277).Google Scholar
  79. Moreau, F., Claveau, V., & Sébillot, P. (2007). Automatic morphological query expansion using analogy-based machine learning. In Advances in information retrieval, Vol. 4425 of l ecture notes in computer science (pp. 222–233).Google Scholar
  80. Moukdad, H. (2004). Lost in cyberspace: How do search engines handle Arabic queries? In Proceedings of the 32nd annual conference of the Canadian Association for Information Science, Winnipeg. Available at: www.cais-acsi.ca/proceedings/2004/moukdad_2004.pdf. Accessed 31 July 2006.
  81. Moukdad, H., & Cui, H. (2005). How do search engines handle Chinese queries? Webology, 2(3), article 17. Available at: http://www.Webology.ir/2005/v2n3/a17.html. Accessed December 2008.
  82. Otero, J., Vilares, J., & Vilares, M. (2008). Corrupted queries in Spanish text retrieval: Error correction vs. n-grams. In Workshop proceedings of the ACM 17th conference on information and knowledge management (CIKM 2008): 2nd ACM workshop on improving non-English Web searching (iNEWS’08) (pp. 39–46). ACM. doi:10.1145/1460027.1460034.
  83. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of OSIR 2006.Google Scholar
  84. Palmer, D. D. (2000). Tokenisation and sentence segmentation, chapter 2. In R. Dale, H. Moisi, & H. Somers (Eds.), Handbook of natural language processing. Marcel Dekker.Google Scholar
  85. Parka, S., Leeb, J., & Bae, H. (2005). End user searching: A Web log analysis of NAVER, a Korean Web search engine. Library & Information Science Research, 27(2), 203–221. doi:10.1016/j.lisr.2005.01.013.CrossRefGoogle Scholar
  86. Peng, F., Ahmed, N., Li, X., & Lu, Y. (2007). Context sensitive stemming for web search. Proceedings of the 30th ACM SIGIR conference (pp. 639–646).Google Scholar
  87. Peters, C., Gey, F. C., Gonzalo, J., Muller, H., Jones, G. J. F., Kluck, M., et al. (2006). Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science. Spinger-Verlag.Google Scholar
  88. Pingali, P., Jagarlamudi, J., & Varma, V. (2006). WebKhoj: Indian language IR from multiple character encodings. Proceedings of the 15th international conference on World Wide Web (pp. 801–809).Google Scholar
  89. Piskorski, J. Wieloch, K., & Sydow, M. (2009, this issue). On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Information Retrieval.Google Scholar
  90. Pohlmann, R., & Kraaij, W. (1997). The effect of syntactic phrase indexing on retrieval performance for Dutch texts. In Proceedings of RIAO, ’97 (pp. 176–187).Google Scholar
  91. Popovic, M., & Willett, P. (1992). The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science American Society for Information Science, 43(5), 384–390. doi:10.1002/(SICI)1097-4571(199206)43:5<384::AID-ASI6>3.0.CO;2-L.CrossRefGoogle Scholar
  92. Porter, M. (1980). An algorithm for Suffix Stripping. Program, 14(3), 130–137.Google Scholar
  93. Saian, R., & Ku-Mahamud, K. R. (2004). Searching malay text using stemming algorithm. JICT, 3(2), 107–117. Available at http://jict.uum.edu.my. Accessed December 2008.
  94. Savoy, J. (1999). A stemming procedure and stopword list for general French corpora. Journal of the American Society for Information Science American Society for Information Science, 50(10), 944–952. doi:10.1002/(SICI)1097-4571(1999)50:10<944::AID-ASI9>3.0.CO;2-Q.CrossRefGoogle Scholar
  95. Savoy, J. (2003). Cross-language information retrieval: Experiments based on CLEF 2000 corpora. Information Processing and Management, 39, 75–115. doi:10.1016/S0306-4573(02)00018-3.MATHCrossRefGoogle Scholar
  96. Savoy, J. (2007). Searching strategies for the Bulgarian language. Information Retrieval, 10(6), 509–529. doi:10.1007/s10791-007-9033-9.CrossRefGoogle Scholar
  97. Savoy, J. (2008). Searching strategies for the Hungarian language. Information Processing and Management, 44(1), 310–324. doi:10.1016/j.ipm.2007.01.022.CrossRefGoogle Scholar
  98. Schinke, R., Greengrass, M., Robertson, A. M., & Willett, P. (1996). A stemming algorithm for Latin text databases. Journal of Documentation, 52(2), 172–187.CrossRefGoogle Scholar
  99. Sigurbjörnsson, B., Kamps, J., & de Rijke, M. (2006). EuroGOV: Engineering a multilingual Web corpus. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 825–836).Google Scholar
  100. Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1999). Analysis of a very large Web search engine query log. ACM SIGIR Forum, 33(1), 6–12. doi:10.1145/331403.331405.CrossRefGoogle Scholar
  101. Solak, A., & Oflazer, K. (1993). Design and implementation of a spelling checker for Turkish. Literary and Linguistic Computing, 8(3), 113–130. doi:10.1093/llc/8.3.113.CrossRefGoogle Scholar
  102. Spink, A., Wolfram, D., Jansen, B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science American Society for Information Science, 52(3), 226–234.CrossRefGoogle Scholar
  103. Sroka, M. (2000). Web search engines for Polish information retrieval: Questions of search capabilities and retrieval performance. The International Information & Library Review, 32(2), 87–98. doi:10.1006/iilr.2000.0128.CrossRefGoogle Scholar
  104. Tomlinson, S. (2006a). Bulgarian and Hungarian experiments with hummingbird searchserver at CLEF 2005. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 194–203).Google Scholar
  105. Tomlinson, S. (2006b). Danish and Greek Web search experiments with hummingbird searchserver at CLEF 2005. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 846–855).Google Scholar
  106. Tongchim, S., Sornlertlamvanich, V., & Isahara, H. (2007). Improving search performance: A lesson learned from evaluating search engines using thai queries. IEICE Transactions on Information and Systems E (Norwalk, Connecticut), 90–D(10), 1557–1564. doi:10.1093/ietisy/e90-d.10.1557.CrossRefGoogle Scholar
  107. Tzoukermann, E., Klavans, J., & Jacquemin, C. (1997). Effective use of natural language processing techniques for automatic conflation of multi-word terms: The role of derivational morphology, part of speech tagging, and shallow parsing. In Proceedings of the 20th ACM SIGIR Conference (SIGIR’97).Google Scholar
  108. Vega, V., & Bressan, S. (2001). Indexing the Indonesian Web: Language Identification and Miscellaneous Issues. In 10th international World Wide Web conference. http://www10.org/cdrom/posters/p1044/index.htm. Accessed December 2008.
  109. Vilares, J., Alonso, M. A., Ribadas, F. J., & Vilares, M. (2003). COLE experiments at CLEF 2002 Spanish monolingual track. In Advances in cross-language information retrieval, Vol. 2785 of lecture notes in computer science (pp. 265–278).Google Scholar
  110. Vilares, J., Alonso, M. A., & Vilares, M. (2008). Extraction of complex index terms in non-English IR: A shallow parsing based approach. Information Processing and Management, 44(4), 1517–1537. doi:10.1016/j.ipm.2007.12.005.CrossRefGoogle Scholar
  111. Vilares, J., Cabrero, D., & Alonso, M. A. (2001). Applying productive derivational morphology to term indexing of Spanish texts. In Computational linguistics and intelligent text processing, Vol. 2004 of lecture notes in computer science (pp. 336–348).Google Scholar
  112. Vilares, M., Graña, J., & Alvariño, P. (1996). Finite-state morphology and formal verification. Journal of Natural Language Engineering, 2(4), 303–304. doi:10.1017/S1351324997001551.CrossRefGoogle Scholar
  113. Xu, J., & Croft, W. B. (1998). Corpus-based stemming using cooccurrence of word variants. ACM Transactions on Information Systems, 16(1), 61–81. doi:10.1145/267954.267957.CrossRefGoogle Scholar
  114. Yang, C., Luk, J., Yung, S., & Yen, J. (2000). Combination and boundary detection approaches on Chinese indexing. Journal of the American Society for Information Science American Society for Information Science, 51(4), 340–351. doi:10.1002/(SICI)1097-4571(2000)51:4<340::AID-ASI4>3.0.CO;2-I.CrossRefGoogle Scholar
  115. Zou, F., Wang, F. L., Deng, X., & Han, S. (2006). Automatic identification of Chinese stop words. Research on Computing Science: Special issue on Advances in Natural Language Processing, 18, 151–162.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Fotis Lazarinis
    • 1
  • Jesús Vilares
    • 2
  • John Tait
    • 3
  • Efthimis N. Efthimiadis
    • 4
  1. 1.Technological Educational Institute of Mesolonghi MesolonghiGreece
  2. 2.Department of Computer ScienceUniversity of A CoruñaA CoruñaSpain
  3. 3.Information Retrieval FacilityViennaAustria
  4. 4.The Information SchoolUniversity of WashingtonSeattleUSA

Personalised recommendations