Advertisement

Frontiers of Computer Science

, Volume 12, Issue 1, pp 163–176 | Cite as

Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE

  • Ilyes Khennak
  • Habiba Drias
Research Article

Abstract

Because of users’ growing utilization of unclear and imprecise keywords when characterizing their information need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms occurring in the largest possible number of documents where the query keywords appear; (2) proximity, where more importance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria simultaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the retrieval performance as compared to the baseline.

Keywords

information retrieval query expansion pseudorelevance feedback proximity multi-objective optimization Pareto dominance MEDLINE 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11704_2016_5560_MOESM1_ESM.ppt (353 kb)
Supplementary material, approximately 353 KB.

References

  1. 1.
    Ranganathan P. From microprocessors to nanostores: rethinking datacentric systems. IEEE Computer, 2011, 44(1): 39–48Google Scholar
  2. 2.
    Zhu Y Y, Zhong N, Xiong Y. Data explosion, data nature and dataology. In: Proceedings of International Conference on Brain Informatics. 2009, 147–158Google Scholar
  3. 3.
    Ntoulas A, Cho J, Olston C. What’s new on the Web?: the evolution of the Web from a search engine perspective. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 1–12Google Scholar
  4. 4.
    Bharat K, Broder A. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 1998, 30(1): 379–388Google Scholar
  5. 5.
    Williams H E, Zobel J. Searchable words on the Web. International Journal on Digital Libraries, 2005, 5(2): 99–105Google Scholar
  6. 6.
    Eisenstein J, O’Connor B, Smith N A, Xing E P. Mapping the geographical diffusion of new words. In: Proceedings of Workshop on Social Network and Social Media Analysis: Methods, Models and Applications. 2012Google Scholar
  7. 7.
    Sun H M. A study of the features of internet english from the linguistic perspective. Studies in Literature and Language, 2010, 1(7): 98–103Google Scholar
  8. 8.
    Chen Q, Li M, Zhou M. Improving query spelling correction usingWeb search results. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007, 181–189Google Scholar
  9. 9.
    Subramaniam L V, Roy S, Faruquie T A, Negi S. A survey of types of text noise and techniques to handle noisy text. In: Proceedings of the 3rd Workshop on Analytics for Noisy Unstructured Text Data. 2009, 115–122Google Scholar
  10. 10.
    Ahmad F, Kondrak G. Learning a spelling error model from search query logs. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005, 955–962Google Scholar
  11. 11.
    Carpineto C, Romano G. A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 2012, 44(1): 1–50MATHGoogle Scholar
  12. 12.
    Véronis J. Hyperlex: lexical cartography for information retrieval. Computer Speech & Language, 2004, 18(3): 223–252Google Scholar
  13. 13.
    Bernardini A, Carpineto C, Amico M D. Full-subtopic retrieval with keyphrase-based search results clustering. In: Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technologies. 2009, 206–213Google Scholar
  14. 14.
    Wong S K M, Ziarko W, Raghavan V V, Wong P. On modeling of information retrieval concepts in vector spaces. ACM Transactions on Database Systems, 1987, 12(2): 299–321Google Scholar
  15. 15.
    Crestani F. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review, 1997, 11(6): 453–482Google Scholar
  16. 16.
    Carpineto C, Romano G. Concept Data Analysis: Theory and Applications. Chichester: John Wiley & Sons, 2004MATHGoogle Scholar
  17. 17.
    Sahlgren M. An introduction to random indexing. In: Proceedings of Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering. 2005Google Scholar
  18. 18.
    Melucci M. A basis for information retrieval in context. ACM Transactions on Information Systems, 2008, 26(3): 1–41Google Scholar
  19. 19.
    Sun R, Ong C H, Chua T S. Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 382–389Google Scholar
  20. 20.
    Schlaefer N, Ko J, Betteridge J, Pathak M A, Nyberg E, Sautter G. Semantic extensions of the Ephyra QA system for TREC 2007. In: Proceedings of the 16th Text REtrieval Conference. 2007Google Scholar
  21. 21.
    Kraaij W, Nie J Y, Simard M. Embedding Web-based statistical translation models in cross-language information retrieval. Computational Linguistics, 2003, 29(3): 381–419MATHGoogle Scholar
  22. 22.
    Kherfi M L, Ziou D, Bernardi A. Image retrieval from the World Wide Web: issues, techniques, and systems. ACM Computing Surveys, 2004, 36(1): 35–67Google Scholar
  23. 23.
    Natsev A P, Haubold A, Tešić J, Xie L X, Yan R. Semantic conceptbased query expansion and re-ranking for multimedia retrieval. In: Proceedings of the 15th ACM International Conference on Multimedia. 2007, 991–1000Google Scholar
  24. 24.
    Arguello J, Elsas J L, Callan J, Carbonell J G. Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference onWeblogs and Social Media. 2008, 10–18Google Scholar
  25. 25.
    Hidalgo J M G, de Buenaga Rodríguez M, Pérez J C C. The role of word sense disambiguation in automated text categorization. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems. 2005, 298–309Google Scholar
  26. 26.
    Graupmann J, Cai J, Schenkel R. Automatic query refinement using mined semantic relations. In: Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration. 2005, 205–213Google Scholar
  27. 27.
    Kamvar M, Baluja S. The role of context in query input: using contextual signals to complete queries on mobile devices. In: Proceedings of the 9th International Conference on Human Computer Interaction with Mobile Devices and Services. 2007, 405–412Google Scholar
  28. 28.
    Huang C C, Lin K M, Chien L F. Automatic training corpora acquisition through Web mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technologies. 2005, 193–199Google Scholar
  29. 29.
    Perugini S, Ramakrishnan N. Interacting withWeb hierarchies. IT Professional, 2006, 8(4): 19–28Google Scholar
  30. 30.
    Church K, Smyth B. Mobile content enrichment. In: Proceedings of the 12th International Conference on Intelligent User Interfaces. 2007, 112–121Google Scholar
  31. 31.
    Macdonald C, Ounis I. Expertise drift and query expansion in expert search. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. 2007, 341–350Google Scholar
  32. 32.
    Billerbeck B, Zobel J. Document expansion versus query expansion for ad-hoc retrieval. In: Proceedings of the 10th Australasian Document Computing Symposium. 2005, 34–41Google Scholar
  33. 33.
    Shokouhi M, Azzopardi L, Thomas P. Effective query expansion for federated search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 427–434Google Scholar
  34. 34.
    Wang H, Liang Y, Fu L, Xue G R, Yu Y. Efficient query expansion for advertisement search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 51–58Google Scholar
  35. 35.
    Voorhees E M. Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1994, 61–69Google Scholar
  36. 36.
    Collins-Thompson K, Callan J. Query expansion using random walk models. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 2005, 704–711Google Scholar
  37. 37.
    Liu S, Liu F, Yu C, Meng W Y. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004, 266–272Google Scholar
  38. 38.
    Song M, Song I Y, Hu X H, Allen R B. Integration of association rules and ontologies for semantic query expansion. Data & Knowledge Engineering, 2007, 63(1): 63–75Google Scholar
  39. 39.
    Gauch S, Wang J Y, Rachakonda S M. A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems, 1999, 17(3): 250–269Google Scholar
  40. 40.
    Hu J N, Deng W H, Guo J. Improving retrieval performance by global analysis. In: Proceedings of the 18th International Conference on Pattern Recognition. 2006, 703–706Google Scholar
  41. 41.
    Park L A, Ramamohanarao K. Query expansion using a collection dependent probabilistic latent semantic thesaurus. In: Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2007, 224–235Google Scholar
  42. 42.
    Milne D N, Witten I H, Nichols D M. A knowledge-based search engine powered by wikipedia. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. 2007, 445–454Google Scholar
  43. 43.
    Rocchio J J. Relevance feedback in information retrieval. The SMART Retrieval System-Experiments in Automatic Document Processing, 1971, 313–323Google Scholar
  44. 44.
    Robertson S E, Jones K S. Relevance weighting of search terms. Journal of the American Society for Information Science, 1976, 27(3): 129–146Google Scholar
  45. 45.
    Wong W, Luk R W P, Leong H V, Ho K, Lee D L. Re-examining the effects of adding relevance information in a relevance feedback environment. Information Processing & Management, 2008, 44(3): 1086–1116Google Scholar
  46. 46.
    Zhai C X, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2001, 403–410Google Scholar
  47. 47.
    Lavrenko V, Croft W B. Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001, 120–127Google Scholar
  48. 48.
    Khennak I, Drias H. Strength pareto fitness assignment for generating expansion features. In: Proceedings of the 3rd World Conference on Information Systems and Technologies. 2015, 133–142Google Scholar
  49. 49.
    Robertson S, Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends® in Information Retrieval, 2009, 3(4): 333–389Google Scholar
  50. 50.
    Robertson S E. On term selection for query expansion. Journal of Documentation, 1990, 46(4): 359–364Google Scholar
  51. 51.
    Carpineto C, De Mori R, Romano G, Bigi B. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 2001, 19(1): 1–27Google Scholar
  52. 52.
    Jurafsky D, Martin J H. Speech and Language Processing. Upper Saddle River, NJ: Pearson Prentice Hall, 2014Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany 2018

Authors and Affiliations

  1. 1.Laboratory for Research in Artificial Intelligence, Computer Science DepartmentUniversity of Sciences and Technology Houari Boumediene (USTHB)AlgiersAlgeria

Personalised recommendations