Advertisement

Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function

  • Yang SongEmail author
  • Wenxin Hu
  • Liang He
  • Liang Dou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)

Abstract

The proximity based information retrieval models usually use the same pre-define density function for all of terms in the collection to estimate their influence distribution. In healthcare domain, however, different terms in the same document have different influence distributions, the same term in different documents also has different influence distributions, and the pre-defined density function may not completely match the terms’ actual influence distributions. In this paper, we define a saturated density function to measure the best suitable density function that fits the given term’s influence distribution, and propose a self-adaptive approach on saturated density function building for each term in various circumstance. Particularly, our approach utilizing Gamma process is an unsupervised model with no requirements for external resources. Then, we construct a density based weighting method for the purpose of evaluating the effectiveness of our approach. Finally, we conduct our experiment on five standard CLEF and TREC datasets, and the experimental results show that our approach is promising and outperforms the pre-defined density functions in healthcare retrieval.

Keywords

Saturated density function Information retrieval Self-adaptive 

Notes

Acknowledgements

We thank all viewers who provided the thoughtful and constructive comments on this paper. The second author is the corresponding author. This research is supported by the open funds of NPPA Key Laboratory of Publishing Integration Development, ECNUP, the Shanghai Municipal Commission of Economy and Informatization (No. 170513), and Xiaoi Research. The computation is performed in the Supercomputer Center of ECNU.

References

  1. 1.
    Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)CrossRefGoogle Scholar
  2. 2.
    Beigbeder, M., Mercier, A.: Fuzzy proximity ranking with Boolean queries. In: Fourteenth Text Retrieval Conference, Trec 2005, Gaithersburg, Maryland, November 2005Google Scholar
  3. 3.
    Clarke, C.L., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: TREC, vol. 4, pp. 295–304. Citeseer (1995)Google Scholar
  4. 4.
    Cummins, R., O’Riordan, C., Lalmas, M.: An analysis of learned proximity functions. In: Adaptivity, Personalization and Fusion of Heterogeneous Information (2010)Google Scholar
  5. 5.
    De Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: SIGIR 1999: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 15–19 August 1999, pp. 113–120 (1999)Google Scholar
  6. 6.
    Gerani, S., Carman, M., Crestani, F.: Aggregation methods for proximity-based opinion retrieval. ACM Trans. Inf. Syst. 30(4), 403–410 (2012)CrossRefGoogle Scholar
  7. 7.
    Hawking, D., Thistlewaite, P.: Proximity operators-so near and yet so far. In: Proceedings of the 4th Text Retrieval Conference, pp. 131–143 (1995)Google Scholar
  8. 8.
    Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics, 5th edn. Englewood Hills, New Jersey (1995)zbMATHGoogle Scholar
  9. 9.
    Keen, E.M.: The use of term position devices in ranked output experiments. J. Doc. 47(1), 1–22 (1991)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Keen, E.M.: Some aspects of proximity searching in text retrieval systems. J. Inf. Sci. 18(2), 89–98 (1992)CrossRefGoogle Scholar
  11. 11.
    Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage retrieval based on density distributions of terms and its applications to document retrieval and question answering. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning. LNCS, vol. 2956, pp. 306–327. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24642-8_17CrossRefGoogle Scholar
  12. 12.
    Lu, X.: Improving search using proximity-based statistics. In: The International ACM SIGIR Conference, pp. 1065–1065 (2015)Google Scholar
  13. 13.
    Lv, Y., Zhai, C.X.: Positional language models for information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306 (2009)Google Scholar
  14. 14.
    Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)Google Scholar
  15. 15.
    Mahdabi, P., Gerani, S., Huang, J.X., Crestani, F.: Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2013)Google Scholar
  16. 16.
    Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development In information Retrieval, pp. 472–479. ACM (2005)Google Scholar
  17. 17.
    Miao, J., Huang, J.X., Ye, Z.: Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544 (2012)Google Scholar
  18. 18.
    Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, pp. 731–740, November 2007Google Scholar
  19. 19.
    Song, Y., Hu, W., Chen, Q., Hu, Q., He, L.: Enhancing the recurrent neural networks with positional gates for sentence representation. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11301, pp. 511–521. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-04167-0_46CrossRefGoogle Scholar
  20. 20.
    Tao, T., Zhai, C.X.: An exploration of proximity measures in information retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, Amsterdam, pp. 295–302, July 2007Google Scholar
  21. 21.
    Zhao, J., Huang, J.X.: An enhanced context-sensitive proximity model for probabilistic information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1131–1134 (2014)Google Scholar
  22. 22.
    Zhao, J., Huang, J.X., He, B.: CRTER: using cross terms to enhance probabilistic information retrieval. In: Proceeding of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, pp. 155–164, July 2011Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyEast China Normal UniversityShanghaiChina
  2. 2.NPPA Key Laboratory of Publishing Intergration DevelopmentECNUPShanghaiChina

Personalised recommendations