Abstract
The high-variability in content and structure combined with transcription errors makes effective information retrieval (IR) from archives of spoken user generated content (UGC) very challenging. Previous research has shown that using passage-level evidence for query expansion (QE) in IR can be beneficial for improving search effectiveness. Our investigation of passage-level QE for a large Internet collection of UGC demonstrates that while it is effective for this task, the informal and variable nature of UGC means that different queries respond better to alternative types of passages or in some cases use of whole documents rather than extracted passages. We investigate the use of Query Performance Prediction (QPP) to select the appropriate passage type for each query, including the introduction of a novel Weighted Expansion Gain (WEG) as a QPP new method. Our experimental investigation using an extended adhoc search task based on the MediaEval 2012 Search task shows the superiority of using our proposed adaptive QE approach for retrieval. The effectiveness of this method is shown in a per-query evaluation of utilising passage and full document evidence for QE within the inconsistent, uncertain settings of UGC retrieval.
This is a preview of subscription content, access via your institution.
Buying options

Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Confirmed by running query-level paired t-test comparison at the 0.05 confidence level [21].
References
Allan, J.: Relevance feedback with too much data. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 337–343. ACM (1995)
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness (2002)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval (2012)
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL 2000 (2000)
Eskevich, M.: Towards effective retrieval of spontaneous conversational spoken content. Ph.D. thesis, Dublin City University (2014)
Eskevich, M., Jones, G.J.F., Wartena, C., Larson, M., Aly, R., Verschoor, T., Ordelman, R.: Comparing retrieval effectiveness of alternative content segmentation methods for internet video search. In: 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2012)
Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval track: a success story. In: Proceedings of RIAO 2000, pp. 1–8 (2000)
Gianni, A.: Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Department of Computing Science, University of Glasgow (2003)
Gu, Z., Luo, M.: Comparison of using passages and documents for blind relevance feedback in information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 482–483. ACM (2004)
He, B., Ounis, I.: Studying query expansion effectiveness. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 611–619. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00958-7_57
Khwileh, A., Ganguly, D., Jones, G.J.: Utilisation of metadata fields and query expansion in cross-lingual search of user-generated internet video (2016)
Khwileh, A., Jones, G.J.: Investigating segment-based query expansion for user-generated spoken content retrieval. In: 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2016)
Kurland, O., Shtok, A., Hummel, S., Raiber, F., Carmel, D., Rom, O.: Back to the roots: a probabilistic framework for query-performance prediction. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 823–832. ACM (2012)
Lam-Adesina, A.M., Jones, G.J.F.: Dublin City University at CLEF 2005: cross-language speech retrieval (CL-SR) experiments. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 792–799. Springer, Heidelberg (2006). doi:10.1007/11878773_87
Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 375–382. ACM (2002)
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214. ACM (1998)
Larson, M., Jones, G.J.F.: Spoken content retrieval: a survey of techniques and technologies (2011)
Pecina, P., Hoffmannová, P., Jones, G.J.F., Zhang, Y., Oard, D.W.: Overview of the CLEF-2007 cross-language speech retrieval track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 674–686. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85760-0_86
Schmiedeke, S., Xu, P., Ferrané, I., Eskevich, M., Kofler, C., Larson, M.A., Estève, Y., Lamel, L., Jones, G.J.F., Sikora, T.: Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of the 4th ACM Multimedia Systems Conference, pp. 96–101. ACM (2013)
Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04417-5_30
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 623–632. ACM (2007)
Terol, R.M., Palomar, M., Martinez-Barco, P., Llopis, F., Muñoz, R., Noguera, E.: The University of Alicante at CL-SR track. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 769–772. Springer, Heidelberg (2006). doi:10.1007/11878773_84
Wang, J., Oard, D.W.: CLEF-2005 CL-SR at Maryland: document and query expansion using side collections and thesauri. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 800–809. Springer, Heidelberg (2006). doi:10.1007/11878773_88
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 543–550. ACM (2007)
Acknowledgments
This research was partially supported by Science Foundation Ireland in the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Khwileh, A., Way, A., Jones, G.J.F. (2017). Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction. In: , et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-65813-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65812-4
Online ISBN: 978-3-319-65813-1
eBook Packages: Computer ScienceComputer Science (R0)