Skip to main content

A Probabilistic Retrieval Model for Semistructured Data

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

Retrieving semistructured (XML) data typically requires either a structured query such as XPath, or a keyword query that does not take structure into account. In this paper, we infer structural information automatically from keyword queries and incorporate this into a retrieval model. More specifically, we propose the concept of a mapping probability, which maps each query word into a related field (or XML element). This mapping probability is used as a weight to combine the language models estimated from each field. Experiments on two test collections show that our retrieval model based on mapping probabilities outperforms baseline techniques significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: enabling keyword search over relational databases. In: SIGMOD Conference, p. 627 (2002)

    Google Scholar 

  2. Amer-Yahia, S., Lalmas, M.: Xml search: languages, inex and scoring. SIGMOD Record 35(4), 16–23 (2006)

    Article  Google Scholar 

  3. Calado, P., da Silva, A.S., Vieira, R.C., Laender, A.H.F., Ribeiro-Neto, B.A.: Searching web databases by structuring keyword-based queries. In: CIKM 2002: Proceedings of the eleventh international conference on Information and knowledge management, pp. 26–33. ACM, New York (2002)

    Chapter  Google Scholar 

  4. Cleverdon, C.W.: The significance of the cranfield tests on index languages. In: SIGIR, pp. 3–12 (1991)

    Google Scholar 

  5. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)

    Google Scholar 

  6. Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

    Google Scholar 

  7. Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development Information Retrieval, pp. 191–203. ACM, New York (1993)

    Google Scholar 

  8. Géry, F.T.M., Largeron, C.: Probabilistic document model integrating XML structure. In: Proceedings in INEX, pp. 139–149 (2007)

    Google Scholar 

  9. Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Petkova, D., Croft, W.B., Diao, Y.: Refining keyword queries for xml retrieval by combining content and structure. CIIR Technical Report (2008)

    Google Scholar 

  11. Ponte, J., Croft, W.B.: A language modeling approach to information retrieval, pp. 275–281. ACM, New York (1998)

    Google Scholar 

  12. Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)

    MATH  Google Scholar 

  13. Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 42–49. ACM, New York (2004)

    Chapter  Google Scholar 

  14. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language model-based search engine for complex queries (2005) (poster presentation)

    Google Scholar 

  15. Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, J., Xue, X., Croft, W.B. (2009). A Probabilistic Retrieval Model for Semistructured Data. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics