Advertisement

Investigation of Passage Based Ranking Models to Improve Document Retrieval

  • Ghulam SarwarEmail author
  • Colm O’Riordan
  • John Newell
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 976)

Abstract

Passage retrieval deals with identifying and retrieving small but explanatory portions of a document that answers a user’s query. In this paper, we focus on improving the document ranking by using different passage based evidence. Several similarity measures were evaluated and a more in-depth analysis was undertaken into the effect of varying specific. We have also explored the notion of query difficulty to understand whether the best performing passage-based approach helps to improve, or not, the performance of certain queries. Experimental results indicate that for the passage level technique, the worst-performing queries are damaged slightly and the those that perform well are boosted for the WebAp collection. However, our rank-based similarity function boosted the performance of the difficult queries in the Ohsumed collection.

Keywords

Document retrieval Passage-based document retrieval Passage similarity functions Inverse rank Query difficulty 

Notes

Acknowledgements

This work is supported by the Irish Research Council Employment Based Programme.

References

  1. 1.
    Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends ® Inf. Retr. 3, 333–389 (2009)CrossRefGoogle Scholar
  2. 2.
    Roberts, I., Gaizauskas, R.: Evaluating passage retrieval approaches for question answering. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 72–84. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24752-4_6CrossRefGoogle Scholar
  3. 3.
    Sarwar, G., O’Riordan, C., Newell, J.: Passage level evidence for effective document level retrieval. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 83–90 (2017)Google Scholar
  4. 4.
    Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 192–201. Springer, London (1994).  https://doi.org/10.1007/978-1-4471-2099-5_20CrossRefGoogle Scholar
  5. 5.
    Callan, J.P.: Passage-level evidence in document retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 302–310. Springer, London (1994).  https://doi.org/10.1007/978-1-4471-2099-5_31CrossRefGoogle Scholar
  6. 6.
    Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23, 33–64 (1997)Google Scholar
  7. 7.
    Bendersky, M., Kurland, O.: Utilizing passage-based language models for document retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 162–174. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78646-7_17CrossRefGoogle Scholar
  8. 8.
    Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. J. Am. Soc. Inf. Sci. Technol. 52, 344–364 (2001)CrossRefGoogle Scholar
  9. 9.
    Clarke, C.L., Cormack, G.V., Lynam, T.R., Terra, E.L.: Question answering by passage selection. In: Strzalkowski, T., Harabagiu, S.M. (eds.) Advances in Open Domain Question Answering, pp. 259–283. Springer, Dordrecht (2008).  https://doi.org/10.1007/978-1-4020-4746-6_8CrossRefGoogle Scholar
  10. 10.
    Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 375–382. ACM (2002)Google Scholar
  11. 11.
    Jong, M.H., Ri, C.H., Choe, H.C., Hwang, C.J.: A method of passage-based document retrieval in question answering system. arXiv preprint arXiv:1512.05437 (2015)
  12. 12.
    Sarwar, G., O’Riordan, C., Newell, J.: Passage level evidence for effective document level retrieval (2017)Google Scholar
  13. 13.
    Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart: TREC 3. NIST Special Publication SP, p. 69 (1995)Google Scholar
  14. 14.
    Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–68. ACM (1993)Google Scholar
  15. 15.
    Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58. ACM (1993)Google Scholar
  16. 16.
    Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and development in Information Retrieval, pp. 120–127. ACM (2001)Google Scholar
  17. 17.
    Krikon, E., Kurland, O., Bendersky, M.: Utilizing inter-passage and inter-document similarities for reranking search results. ACM Trans. Inf.Syst. (TOIS) 29, 3 (2010)CrossRefGoogle Scholar
  18. 18.
    Bendersky, M., Kurland, O.: Re-ranking search results using document-passage graphs. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 853–854. ACM (2008)Google Scholar
  19. 19.
    Ai, Q., O’Connor, B., Croft, W.B.: A neural passage model for ad-hoc document retrieval. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 537–543. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-76941-7_41CrossRefGoogle Scholar
  20. 20.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)Google Scholar
  21. 21.
    Galkó, F., Eickhoff, C.: Biomedical question answering via weighted neural network passage retrieval. arXiv preprint arXiv:1801.02832 (2018)
  22. 22.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  23. 23.
    Mothe, J., Tanguy, L.: Linguistic features to predict query difficultyGoogle Scholar
  24. 24.
    He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30213-1_5CrossRefzbMATHGoogle Scholar
  25. 25.
    Lashkari, A.H., Mahdavi, F., Ghomi, V.: A boolean model in information retrieval for search engines. In: International Conference on Information Management and Engineering, ICIME 2009, pp. 385–389. IEEE (2009)Google Scholar
  26. 26.
    Keikha, M., Park, J.H., Croft, W.B., Sanderson, M.: Retrieving passages and finding answers. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 81. ACM (2014)Google Scholar
  27. 27.
    Chen, R.C., Spina, D., Croft, W.B., Sanderson, M., Scholer, F.: Harnessing semantics for answer sentence retrieval. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–27. ACM (2015)Google Scholar
  28. 28.
    Yang, L., et al.: Beyond factoid QA: effective methods for non-factoid answer sentence retrieval. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 115–128. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30671-1_9CrossRefGoogle Scholar
  29. 29.
    He, J., Larson, M., de Rijke, M.: Using coherence-based measures to predict query difficulty. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 689–694. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78646-7_80CrossRefGoogle Scholar
  30. 30.
    Cummins, R., Jose, J., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1089–1090. ACM, New York (2011)Google Scholar
  31. 31.
    Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.: On ranking the effectiveness of searches. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 398–404. ACM, New York (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Information TechnologyNational University of IrelandGalwayIreland
  2. 2.School of Mathematics, Statistics and Applied MathematicsNational University of IrelandGalwayIreland

Personalised recommendations