Berkeley at GeoCLEF: Logistic Regression and Fusion for Geographic Information Retrieval

  • Ray R. Larson
  • Fredric C. Gey
  • Vivien Petras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4022)


In this paper we will describe the Berkeley (groups 1 and 2 combined) submissions and approaches to the GeoCLEF task for CLEF 2005. The two Berkeley groups used different systems and approaches for GeoCLEF with some common themes. For Berkeley group 1 (Larson) the main technique used was fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. The Berkeley group 2 (Gey and Petras) employed tested CLIR methods from previous CLEF evaluations using Logistic Regression with Blind Feedback. Both groups used multiple translations of queries in for cross-language searching, and the primary geographically-based approaches taken by both involved query expansion with additional place names. The Berkeley1 group used GIR indexing techniques to georeference proper nouns in the text using a gazetteer derived from the World Gazetteer (with both English and German names for each place), and automatically expanded place names in topics for regions or countries in the queries by the names of the countries or cities in those regions or countries. The Berkeley2 group used manual expansion of queries, adding additional place names.


Machine Translation Query Expansion Query Translation Document Component Bilingual Task 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, A.: Cross-Language Retrieval Experiments at CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 28–48. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Chen, A., Gey, F.C.: Multilingual information retrieval using machine translation, relevance feedback and decompounding. Information Retrieval 7, 149–182 (2004)CrossRefGoogle Scholar
  3. 3.
    Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic regression. In: 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21-24, pp. 198–210. ACM, New York (1992)CrossRefGoogle Scholar
  4. 4.
    Larson, R.R.: TREC interactive with Cheshire II. Information Processing and Management 37, 485–505 (2001)MATHCrossRefGoogle Scholar
  5. 5.
    Larson, R.R.: A logistic regression approach to distributed IR. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 11-15, pp. 399–400. ACM, New York (2002)CrossRefGoogle Scholar
  6. 6.
    Larson, R.R.: Cheshire II at GeoCLEF: Fusion and query expansion for GIR. In: CLEF 2005 Notebook Papers. DELOS Digital Library (2005)Google Scholar
  7. 7.
    Larson, R.R.: A fusion approach to XML structured document retrieval. Information Retrieval 8, 601–629 (2005)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Lee, J.H.: Analyses of multiple evidence combination. In: SIGIR 1997: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 27-31, pp. 267–276. ACM, New York (1997)CrossRefGoogle Scholar
  9. 9.
    Robertson, S.E., Walker, S., Hancock-Beauliee, M.M.: OKAPI at TREC-7: ad hoc, filtering, vlc and interactive track. In: Text Retrieval Conference (TREC-7) (Notebook), November 9-1, pp. 152–164 (1998)Google Scholar
  10. 10.
    Robertson, S.E., Walker, S.: On relevance weights with little relevance information. In: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 16–24. ACM Press, New York (1997)CrossRefGoogle Scholar
  11. 11.
    Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: Proceedings of the 2nd Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pp. 243–252 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ray R. Larson
    • 1
  • Fredric C. Gey
    • 2
  • Vivien Petras
    • 1
  1. 1.School of Information Management and Systems 
  2. 2.UC Data Archive and Technical AssistanceUniversity of CaliforniaBerkeleyUSA

Personalised recommendations