Domain-Specific CLIR of English, German and Russian Using Fusion and Subject Metadata for Query Expansion

  • Vivien Petras
  • Fredric Gey
  • Ray R. Larson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4022)


This paper describes the combined submissions of the Berkeley group for the domain-specific track at CLEF 2005. The data fusion technique being tested is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. The second technique analyzed is query enhancement with domain-specific metadata (thesaurus terms). We describe our technique of Entry Vocabulary Modules, which associates query words with thesaurus terms and suggest its use for monolingual as well as bilingual retrieval. Different weighting and merging schemes for adding keywords to queries as well as translation techniques are described.


Machine Translation Average Precision Round Robin Query Expansion Query Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, A., Cooper, W., Gey, F.: Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. In: Harman, D.K. (ed.) The Second Text Re-trieval Conference (TREC-2), March 1994, pp. 57–66 (1994)Google Scholar
  2. 2.
    Chen, A., Gey, F.: Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding. Information Retrieval 7(1-2), 149–182 (2004)CrossRefGoogle Scholar
  3. 3.
    Cooper, W.S., Chen, A., et al.: Experiments in the probabilistic retrieval of full text documents. In: Third Text Retrieval Conference (TREC-3), Gaithersburg, MD, National Institute of Standards and Technology Special Publication, pp. 500-225 (1994)Google Scholar
  4. 4.
    Efthimiadis, E.N.: Query Expansion. In: Williams, M.E. (ed.) Annual Review of Information Systems and Technology (ARIST), Information Today, Medford (1996)Google Scholar
  5. 5.
    Gauch, S., Smith, J.B.: An expert system for automatic query reformation. Journal of the American Society for Information Science 44(3), 124–136 (1993)CrossRefGoogle Scholar
  6. 6.
    Jones, S.: Interactive thesaurus navigation: intelligence rules OK? Journal of the American Society for Information Science 46(1), 52–59 (1995)CrossRefGoogle Scholar
  7. 7.
    Kluck, M.: The GIRT Data in the Evaluation of CLIR Systems – from 1997 Until 2003. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 376–390. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Larson, R.R.: A fusion approach to XML structured document retrieval. Information Retrieval 8, 601–629 (2005)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Larson, R.R., Gey, F., Petras, V.: Berkeley at GeoCLEF: Logistic Regression and Fusion for Geographic Information Retrieval. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 963–976. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 27-31, 1997, pp. 267–276. ACM, Philadelphia (1997)CrossRefGoogle Scholar
  11. 11.
    Petras, V., Perelman, N., et al.: UC Berkeley at CLEF-2003 – Russian Language Experiments and Domain-Specific Retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 401–411. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Petras, V.: How One Word Can Make all the Difference - Using Subject Metadata for Automatic Query Expansion and Reformulation. In: Working Notes for the CLEF 2005 Workshop, 21-23 September, Vienna, Austria (2005),
  13. 13.
    Plaunt, C., Norgard, B.A.: An Association-Based Method for Automatic Indexing with Controlled Vocabulary. Journal of the American Society for Information Science 49(10), 888–902 (1998)Google Scholar
  14. 14.
    Robertson, S.E., Walker, S.: On relevance weights with little relevance information. In: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 16–24. ACM Press, New York (1997)CrossRefGoogle Scholar
  15. 15.
    Schott, H. (2000). Thesaurus for the Social Sciences. (Vol. 1) German-English. (Vol. 2) English-German. IZ Sozialwissenschaften Bonn (2000)Google Scholar
  16. 16.
    Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: Proceedings of the 2nd Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication, pp. 500-215: pp. 243–252 (1994)Google Scholar
  17. 17.
    Shiri, A.A., Revie, C., et al.: Thesaurus-enhanced search interfaces. Journal of Information Science 28(2), 111–122 (2002)CrossRefGoogle Scholar
  18. 18.
    Sihvonen, A., Vakkari, P.: Subject knowledge improves interactive query expansion assisted by a thesaurus. Journal of Documentation 60(6), 673–690 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Vivien Petras
    • 1
  • Fredric Gey
    • 2
  • Ray R. Larson
    • 1
  1. 1.School of Information Management and SystemsUniversity of CaliforniaBerkeleyUSA
  2. 2.UC Data Archive & Technical Assistance (UC DATA)University of CaliforniaBerkeleyUSA

Personalised recommendations