Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text

  • Chia-Jung Lee
  • W. Bruce Croft
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


Previous work has shown that pseudo relevance feedback (PRF) can be effective for cross-lingual information retrieval (CLIR). This research was primarily based on corpora such as news articles that are written using relatively formal language. In this paper, we revisit the problem of CLIR with a focus on the problems that arise with informal text, such as blogs and forums. To address the problem of the two major sources of “noisy” text, namely translation and the informal nature of the documents, we propose to select between inter- and intra-language PRF, based on the properties of the language of the query and corpora being searched. Experimental results show that this approach can significantly outperform state-of-the-art results reported for monolingual and cross-lingual environments. Further analysis indicates that inter-language PRF is particularly helpful for queries with poor translation quality. Intra-language PRF is more useful for high-quality translated queries as it reduces the impact of any potential translation errors in documents.


Informal text discussion forum cross-language information retrieval pseudo-relevance feedback 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ballesteros, L., Croft, W.B.: Resolving ambiguity for cross-language retrieval. In: Proc. of SIGIR, SIGIR 1998, pp. 64–71 (1998)Google Scholar
  2. 2.
    Bendersky, M., Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proc. of SIGIR, pp. 941–950Google Scholar
  3. 3.
    Braschler, M., Schäuble, P.: Experiments with the eurospider retrieval system for clef 2000. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 140–148. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  4. 4.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 27, 27:1–27:27 (2011)Google Scholar
  5. 5.
    Chen, A.: Multilingual information retrieval using english and chinese queries. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 44–58. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Chen, A., Gey, F.C.: Multilingual information retrieval using machine translation, relevance feedback and decompounding. Inf. Retr. 7(1-2), 149–182 (2004)CrossRefGoogle Scholar
  7. 7.
    Chinnakotla, M.K., Raman, K., Bhattacharyya, P.: Multilingual prf: english lends a helping hand. In: Proc. of SIGIR, pp. 659–666Google Scholar
  8. 8.
    Collins-Thompson, K., Bennett, P.N.: Predicting query performance via classification. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 140–152. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Cormack, G.V., Clarke, C.L.A., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proc. of SIGIR, SIGIR 2009, pp. 758–759 (2009)Google Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  11. 11.
    Gey, F.C., Jiang, H., Petras, V., Chen, A.: Cross-language retrieval for the clef collections - comparing multiple methods of retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 116–128. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    He, D., Wu, D.: Translation enhancement: a new relevance feedback method for cross-language information retrieval. In: Proc. of CIKM, pp. 729–738 (2008)Google Scholar
  14. 14.
    Kurland, O., Shtok, A., Carmel, D., Hummel, S.: A unified framework for post-retrieval query-performance prediction. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 15–26. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Lam-Adesina, A.M., Jones, G.J.F.: Exeter at clef 2003: Experiments with machine translation for monolingual, bilingual and multilingual retrieval. In: CLEF, pp. 271–285 (2003)Google Scholar
  16. 16.
    Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proc. of SIGIR, SIGIR 2001, pp. 120–127 (2001)Google Scholar
  17. 17.
    Lee, J.H.: Combining multiple evidence from different properties of weighting schemes. In: Proc. of SIGIR, pp. 180–188 (1995)Google Scholar
  18. 18.
    Lee, J.H.: Analyses of multiple evidence combination, pp. 267–276. ACM Press (1997)Google Scholar
  19. 19.
    Levow, G.-A.: Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words. In: Proc. of AsianIR 2003, pp. 77–83 (2003)Google Scholar
  20. 20.
    Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proc. of SIGIR, SIGIR 2005, pp. 472–479 (2005)Google Scholar
  21. 21.
    Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proc. of SIGIR, SIGIR 2007, pp. 311–318 (2007)Google Scholar
  22. 22.
    Na, S.-H., Ng, H.T.: Enriching document representation via translation for improved monolingual information retrieval. In: Proc. of SIGIR, pp. 853–862 (2011)Google Scholar
  23. 23.
    Oard, D.W.: A comparative study of query and document translation for cross-language information retrieval. In: Proc. of the Third Conference of the Association for Machine Translation in the Americas (1998)Google Scholar
  24. 24.
    Platt, J.C.: Probabilities for SV Machines, pp. 61–74 (2000)Google Scholar
  25. 25.
    Qu, Y., Eilerman, A.N., Jin, H., Evans, D.A.: The effects of pseudo-relevance feedback on mt-based. In: CLIR, RIAO 2000, Content-Based Multi-Media Information Access, CSAIS, pp. 46–60 (2000)Google Scholar
  26. 26.
    Rogati, M., Yang, Y.: Cross-lingual pseudo-relevance feedback using a comparable corpus. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 151–157. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  27. 27.
    Savoy, J.: Report on CLEF-2001 experiments: Effective combined query-translation approach. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 27–43. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Shaw, J.A., Fox, E.A., Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: TREC-2, pp. 243–252 (1994)Google Scholar
  29. 29.
    Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proc. of SIGIR, SIGIR 1994, pp. 61–69 (1994)Google Scholar
  30. 30.
    Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proc. of SIGIR, pp. 4–11 (1996)Google Scholar
  31. 31.
    Nie, J.Y., Simard, M., Isabelle, P., Dur, R., De Montréal, U.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proc. of SIGIR, pp. 74–81 (1999)Google Scholar
  32. 32.
    Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proc. of CIKM 2001, pp. 403–410 (2001)Google Scholar
  33. 33.
    Zhu, J., Wang, H.: The effect of translation quality in mt-based cross-language information retrieval. In: Proc. of ACL, ACL-44, pp. 593–600 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Chia-Jung Lee
    • 1
  • W. Bruce Croft
    • 1
  1. 1.Center for Intelligent Information Retrieval, School of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations