Advertisement

XML-Structured Documents: Retrievable Units and Inheritance

  • Stephen Robertson
  • Wei Lu
  • Andrew MacFarlane
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4027)

Abstract

We consider the retrieval of XML-structured documents, and of passages from such documents, defined as elements of the XML structure. These are considered from the point of view of passage retrieval, as a form of document retrieval. A retrievable unit (an element chosen as defining suitable passages for retrieval) is a textual document in its own right, but may inherit information from the other parts of the same document. Again, this inheritance is defined in terms of the XML structure. All retrievable units are mapped onto a common field structure, and the ranking function is a standard document retrieval function with a suitable field weighting. A small experiment to demonstrate the idea, using INEX data, is described.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Callan, J.: Passage-level evidence in document retrieval. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310. Springer, Heidelberg (1994)Google Scholar
  2. 2.
    Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC 2004: Web and HARD track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference, TREC 2004. NIST Special Publication 500-261. NIST, Gaithersburg (2005), http://trec.nist.gov/pubs/trec13/t13_proceedings.html
  3. 3.
    Amitay, E., et al.: Juru at TREC 2003 – topic distillation using query-sensitive tuning and cohesiveness filtering. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twelfth Text REtrieval Conference, TREC 2003. NIST Special Publication 500-255. pp. 276–282, NIST, Gaithersburg (2004), http://trec.nist.gov/pubs/trec12/t12_proceedings.html
  4. 4.
    Wilkinson, R.: Effective retrieval of structured documents. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 311–317. Springer, Heidelberg (1994)Google Scholar
  5. 5.
    Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Evans, D.A., Gravano, L., Hertzog, O., Zhai, C.X., Ronthaler, M. (eds.) CIKM 2004: Proceedings of the 13th ACM Conference on Information and Knowledge Management, pp. 42–49. ACM Press, New York (2004)CrossRefGoogle Scholar
  6. 6.
    Craswell, N., Hawking, D.: Overview of the TREC 2004 web track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference, TREC 2004. NIST Special Publication 500-261. pp. 89–97, NIST, Gaithersburg (2005), http://trec.nist.gov/pubs/trec13/t13_proceedings.html
  7. 7.
    Arvola, P., Junkkair, M., Kekalainen, J.: Generalized contextualisation method for XML information retrieval. In: Herzog, O., Schek, H., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) CIKM 2005: Proceedings of the 14th ACM Conference on Information and Knowledge Management, pp. 20–27. ACM Press, New York (2005)CrossRefGoogle Scholar
  8. 8.
    Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Sigurbjornsson, B., Kamps, J., de Rijke, M.: An element-based approach to XML retrieval. In: Fuhr, N., Malik, S., Lalmas, M. (eds.) INEX 2003: Second International Workshop of the Initative for the Evaluation of XML Retrieval, INEX, pp. 19–26 (2004)Google Scholar
  10. 10.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Mixture models, overlap, and structural hints in XML element retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 104–109. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Mass, Y., Mandelbrod, M.: Retrieving the most relevant XML components. In: Fuhr, N., Malik, S., Lalmas, M. (eds.) INEX 2003: Second International Workshop of the Initative for the Evaluation of XML Retrieval, pp. 53–58. INEX (2004)Google Scholar
  12. 12.
    Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 134–140. Springer, Heidelberg (2005)Google Scholar
  13. 13.
    Taylor, M., Zaragoza, H., Craswell, N., Robertson, S.: Optimisation methods for ranking functions with multiple parameters (2006) (Submitted for publication)Google Scholar
  14. 14.
    Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 161–171. Springer, Heidelberg (2006) (Submitted for publication)CrossRefGoogle Scholar
  15. 15.
    INEX: INitiative for the evaluation of XML retrieval (2006), http://inex.is.informatik.uni-duisburg.de/2005/ (visited February 13, 2006)
  16. 16.
    Kazai, G., Lalmas, M.: INEX 2005 evaluation metrics (2005), http://inex.is.informatik.uni-duisburg.de/2005/inex-2005-metricsv6.pdf (visited February 22, 2006)

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Stephen Robertson
    • 1
    • 2
  • Wei Lu
    • 3
    • 2
  • Andrew MacFarlane
    • 4
  1. 1.Microsoft ResearchCambridgeUK
  2. 2.City UniversityUK
  3. 3.Center for Studies of Information Resources, School of Information ManagementWuhan UniversityChina
  4. 4.Centre for Interactive Systems Research, Department of Information ScienceCity UniversityLondonUK

Personalised recommendations