Logistic Regression and EVIs for XML Books and the Heterogeneous Track

  • Ray R. Larson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4862)

Abstract

For this year’s INEX UC Berkeley focused on the Book track and the Heterogeneous track, For these runs we used the TREC2 logistic regression probabilistic model with blind feedback as well as Entry Vocabulary Indexes (EVIs) for the Books Collection MARC data. For the full text records of the book track we encountered a number of interesting problems in setting up the database, and ended up using page-level indexing of the full collection.

As (once again) the only group to actually submit runs for the Het track, we are guaranteed both the highest, and lowest, effectiveness scores for each task. However, because it was again deemed pointless to conduct the actual relevance assessments on the submissions of a single system, we do not know the exact values of these results.

Keywords

Logistic Regression Information Retrieval Relevance Feedback Query Term Probabilistic Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Callan, J.: Distributed information retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval: Recent research from the Center for Intelligent Information Retrieval, ch. 5, pp. 127–150. Kluwer, Boston (2000)Google Scholar
  2. 2.
    Callan, J.P., Lu, Z., Croft, W.B.: Searching Distributed Collections with Inference Networks. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, pp. 21–28. ACM Press, New York (1995)CrossRefGoogle Scholar
  3. 3.
    Chen, A.: Multilingual information retrieval using english and chinese queries. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 44–58. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Chen, A.: Cross-Language Retrieval Experiments at CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 28–48. Springer, Heidelberg (2003)Google Scholar
  5. 5.
    Chen, A., Gey, F.C.: Multilingual information retrieval using machine translation, relevance feedback and decompounding. Information Retrieval 7, 149–182 (2004)CrossRefGoogle Scholar
  6. 6.
    Cooper, W.S., Chen, A., Gey, F.C.: Full Text Retrieval based on Probabilistic Equations with Coefficients fitted by Logistic Regression. In: Text REtrieval Conference (TREC-2), pp. 57–66 (1994)Google Scholar
  7. 7.
    Cooper, W.S., Gey, F.C., Chen, A.: Full text retrieval based on a probabilistic equation with coefficients fitted by logistic regression. In: Harman, D.K. (ed.) The Second Text Retrieval Conference (TREC-2) (NIST Special Publication 500-215), Gaithersburg, MD, pp. 57–66. National Institute of Standards and Technology (1994)Google Scholar
  8. 8.
    Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic regression. In: 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21-24, 1992, pp. 198–210. ACM Press, New York (1992)CrossRefGoogle Scholar
  9. 9.
    Gey, F., Buckland, M., Chen, A., Larson, R.: Entry vocabulary – a technology to enhance digital search. In: Proceedings of HLT 2001, First International Conference on Human Language Technology, San Diego, March 2001, pp. 91–95 (2001)Google Scholar
  10. 10.
    Gravano, L., García-Molina, H.: Generalizing GlOSS to vector-space databases and broker hierarchies. In: International Conference on Very Large Databases, VLDB, pp. 78–89 (1995)Google Scholar
  11. 11.
    Gravano, L., García-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)CrossRefGoogle Scholar
  12. 12.
    Harman, D.: Relevance feedback and other query modification techniques. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 241–263. Prentice-Hall, Englewood Cliffs (1992)Google Scholar
  13. 13.
    Larson, R.R.: Classification clustering, probabilistic information retrieval, and the online catalog. Library Quarterly 61(2), 133–173 (1991)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Larson, R.R.: Evaluation of advanced retrieval techniques in an experimental online catalog. Journal of the American Society for Information Science 43(1), 34–53 (1992)CrossRefGoogle Scholar
  15. 15.
    Larson, R.R.: A logistic regression approach to distributed IR. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 11-15, 2002, pp. 399–400. ACM Press, New York (2002)CrossRefGoogle Scholar
  16. 16.
    Larson, R.R.: Distributed IR for digital libraries. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 487–498. Springer, Heidelberg (2003)Google Scholar
  17. 17.
    Larson, R.R.: A fusion approach to XML structured document retrieval. Information Retrieval 8, 601–629 (2005)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Larson, R.R.: Probabilistic retrieval approaches for thorough and heterogeneous xml retrieval. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 318–330. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Larson, R.R.: Probabilistic retrieval, component fusion and blind feedback for XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 225–239. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for xml retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 73–84. Springer, Heidelberg (2005)Google Scholar
  21. 21.
    Petras, V., Gey, F., Larson, R.: Domain-specific CLIR of english, german and russian using fusion and subject metadata for query expansion. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 226–237. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science, 129–146, May–June (1976)Google Scholar
  23. 23.
    Voorhees, E., Harman, D. (eds.): The Seventh Text Retrieval Conference (TREC-7). NIST (1998)Google Scholar
  24. 24.
    Voorhees, E., Harman, D. (eds.): The Eighth Text Retrieval Conference (TREC-8). NIST (1999)Google Scholar
  25. 25.
    Xu, J., Callan, J.: Effective retrieval with distributed collections. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112–120 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ray R. Larson
    • 1
  1. 1.School of InformationUniversity of California, BerkeleyBerkeleyUSA

Personalised recommendations