Advertisement

Information Retrieval

, Volume 8, Issue 4, pp 601–629 | Cite as

A Fusion Approach to XML Structured Document Retrieval

  • Ray R. LarsonEmail author
Article

Abstract

In this paper we evaluate the application of data fusion or meta-search methods, combining different algorithms and XML elements, to content-oriented retrieval of XML structured data. The primary approach is the combination of a probabilistic methods using Logistic regression and the Okapi BM-25 algorithm for estimation of document relevance or XML element relevance, in conjunction with Boolean approaches for some query elements. In the evaluation we use the INEX XML test collection to examine the relative performance of individual algorithms and elements and compare these to the performance of the data fusion approaches.

Keywords

XML retrieval data fusion probabilistic retrieval logistic regression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beitzel SM, Jensen EC, Chowdhury A, Frieder O, Grossman D and Goharian N (2003) Disproving the fusion hypothesis: An analysis of data fusion via effective information retrieval strategies. In: Proceedings of the 2003 SAC Conference, pp. 1–5.Google Scholar
  2. Belkin N, Kantor PB, Fox EA and Shaw JA (1995) Combining the evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431–448.Google Scholar
  3. Cooper WS, Gey FC and Chen A (1994) Full text retrieval based on a probabilistic equation with coefficients fitted by logistic regression. In: Harman DK, ed., The Second Text Retrieval Conference (TREC-2) (NIST Special Publication 500-215), National Institute of Standards and Technology, Gaithersburg, MD, pp. 57–66.Google Scholar
  4. Cooper WS, Gey FC and Dabney DP (1992) Probabilistic retrieval based on staged logistic regression. In: 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21–24, ACM, New York, pp. 198–210.Google Scholar
  5. Croft WB (2000) Combining approaches to information retrieval. In: Croft WB, ed., Advances in Information Retrieval: Recent research from the Center for Intelligent Information Retrieval, Kluwer, Boston, chapter 1, pp. 1–36.Google Scholar
  6. Das-Gupta P and Katzer J (1983) A study of the overlap among document representations. In: Kuehn JJ, Ed., Research and Development in Information Retrieval, Sixth Annual International ACM SIGIR Conference, National Library of Medicine, Bethesda, Maryland, USA, June 6–8, ACM, pp. 106–114.Google Scholar
  7. Fuhr N, Gövert N, Kazai G and Lalmas M (Eds.) (2002) INEX: Proceedings of the First Workshop of the Initiative for the Evaluation of XML Retrieval, DELOS Network of Excellence on Digital Libraries.Google Scholar
  8. Fuller M, Mackie E, Sacks-Davis R and Wilkinson R (1994) Structured answers for a large structured documents collection. SIGIR ‘93: Proceedings of the Sixteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, USA, June 27–July 1, pp. 204–213.Google Scholar
  9. Gövert N, Kazai G, Fuhr N and Lalmas M (2003) Evaluating the effectiveness of content-oriented XML retrieval, Technical report, University of Dortmund, Computer Science 6.Google Scholar
  10. Hearst MA (1996) Improving full-text precision on short queries using simple constraints. In: Proceedings of SDAIR ‘96, Las Vegas, NV, University of Nevada, Las Vegas, Las Vegas, pp. 59–68.Google Scholar
  11. Kaszkiel M and Zobel J (1994) Passage retrieval revisited. In: SIGIR ‘97: Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, USA, July 27–31, 1997, pp. 178–185.Google Scholar
  12. Katzer J, McGill MJ, Tessier JA, Frakes W and Das-Gupta P (1982) A study of the overlap among document representations. Information Technology: Research and Development, 1(2):261–274.Google Scholar
  13. Kazai G, Lalmas M and de Vries AP (2004) The overlap problem in content-oriented XML retrieval. In: Jauml;rvelin K, Allan J, Bruza P and Sanderson M, Eds., SIGIR 2004: The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, U.K., July 25–29, ACM, pp. 72–79.Google Scholar
  14. Larson RR (2001) TREC interactive with cheshire II. Information Processing and Management, 37:485–505.Google Scholar
  15. Larson RR (2002) A logistic regression approach to distributed ir. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 11–15, Tampere, Finland, ACM, pp. 399–400.Google Scholar
  16. Larson RR (2003) Cheshire II at INEX: Using a hybrid logistic regression and boolean model for XML retrieval. In: Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). DELOS workshop series, pp. 18–25.Google Scholar
  17. Larson RR (2004) Cheshire II at INEX 03: Component and algorithm fusion for XML retrieval. In: INEX 2003 Workshop Proceedings, University of Duisburg, pp. 38–45. http://inex.is.informatik.uni-duisburg.de:2003/
  18. Lee JH (1997) Analyses of multiple evidence combination. In: SIGIR ‘97: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 27–31, Philadelphia, ACM, pp. 267–276.Google Scholar
  19. Navarro G and Baeza-Yates R (1995) A language for queries on structure and contents of textual databases. In: SIGIR ‘95: Proceedings of the Eighteeneth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, July 9–13, pp. 93–101.Google Scholar
  20. Robertson SE and Walker S (1997) On relevance weights with little relevance information. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, pp. 16–24.Google Scholar
  21. Robertson SE, Walker S and Hancock-Beauliee MM (1998) OKAPI at TREC-7: ad hoc, filtering, vlc and interactive track. Text Retrieval Conference (TREC-7), Nov. 9-1 (Notebook), pp. 152–164.Google Scholar
  22. Shaw JA and Fox EA (1994) Combination of multiple searches. In: Proceedings of the 2nd Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pp. 243–252. citeseer.nj.nec.com/fox94combination.htmlGoogle Scholar
  23. Turtle H and Croft WB (1990) Inference networks for document retrieval. In: Vidick J-L, Ed., Proceedings of the 13th International Conference on Research and Development in Information Retrieval, Association for Computing Machinery, ACM, New York, pp. 1–24.Google Scholar
  24. Wilkinson R (1994) Effective retrieval of structured documents. In: {SIGIR ‘94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 3–6}, pp. 311–317.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.School of Information Management and SystemsUniversity of CaliforniaBerkeleyUSA

Personalised recommendations