Exploiting Semantic Tags in XML Retrieval

  • Qiuyue Wang
  • Qiushi Li
  • Shan Wang
  • Xiaoyong Du
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6203)


With the new semantically annotated Wikipedia XML corpus, we attempt to investigate the following two research questions. Do the structural constraints in CAS queries help in retrieving an XML document collection containing semantically rich tags? How to exploit the semantic tag information to improve the CO queries as most users prefer to express the simplest forms of queries? In this paper, we describe and analyze the work done on comparing CO and CAS queries over the document collection at INEX 2009 ad hoc track, and we propose a method to improve the effectiveness of CO queries by enriching the element content representations with semantic tags. Our results show that the approaches of enriching XML element representations with semantic tags are effective in improving the early precision, while on average precisions, strict interpretation of CAS queries are generally superior.


Language Model Mutual Exclusion Keyword Query Full Content Language Modeling Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chu-Carroll, J., Prager, J., Czuba, K., Ferrucci, D., Duboue, P.: Semantic Search via XML Fragments: A High-Precision Approach to IR. In: SIGIR 2006 (2006)Google Scholar
  2. 2.
    Carmel, D., Maarek, Y.S., Mandelbrod, M., et al.: Searching XML documents via XML fragments. In: SIGIR 2003 (2003)Google Scholar
  3. 3.
    Trotman, A., Sigurbjörnsson, B.: Narrowed extended xPath I (NEXI). In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 16–40. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
  5. 5.
  6. 6.
    Trotman, A., Lalmas, M.: Why Structural Hints in Queries do not Help XML-Retrieval? In: SIGIR 2006 (2006)Google Scholar
  7. 7.
    Schenkel, R., Suchanek, F., Kasneci, G.: YAWN: A Semantically Annotated Wikipedia XML Corpus. In: BTW 2007 (2007)Google Scholar
  8. 8.
    Hiemstra, D.: Statistical Language Models for Intelligent XML Retrieval. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data. LNCS, vol. 2818, pp. 107–118. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Ogilvie, P., Callan, J.: Language Models and Structured Document Retrieval. In: INEX 2003 (2003)Google Scholar
  10. 10.
    Ogilvie, P., Callan, J.: Hierarchical Language Models for XML Component Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Ogilvie, P., Callan, J.: Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 211–224. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Zhai, C.: Statistical Language Models for Information Retrieval: A Critical Review. Foundations and Trends in Information Retrieval 2(3) (2008)Google Scholar
  13. 13.
    Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: SIGIR 2001 (2001)Google Scholar
  14. 14.
    Zhai, C., Lafferty, J.: Two-Stage Language Models for Information Retrieval. In: SIGIR 2002 (2002)Google Scholar
  15. 15.
    Mei, Q., Zhang, D., Zhai, C.: A General Optimization Framework for Smoothing Language Models on Graph Structures. In: SIGIR 2008 (2008)Google Scholar
  16. 16.
    Wang, Q., Li, Q., Wang, S.: Preliminary Work on XML Retrieval. In: Pre-Proceedings of INEX 2007 (2007)Google Scholar
  17. 17.
    Pektova, D., Croft, W.B., Diao, Y.: Refining Keyword Queries for XML Retrieval by Combining Content and Structure. In: ECIR 2009 (2009)Google Scholar
  18. 18.
    Kim, J., Xue, X., Croft, W.B.: A Probabilistic Retrieval Model for Semistructured Data. In: ECIR 2009 (2009)Google Scholar
  19. 19.
    Bo, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML Keyword Search with Relevance Oriented Ranking. In: ICDE 2009 (2009)Google Scholar
  20. 20.
    Metzler, D., Novak, J., Cui, H., Reddy, S.: Building Enriched Document Representations using Aggregated Anchor Text. In: SIGIR 2009 (2009)Google Scholar
  21. 21.
    Kamps, J., Marx, M., de Rijke, M., Sigurbjörnsson, B.: Structured Queries in XML Retrieval. In: CIKM 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Qiuyue Wang
    • 1
  • Qiushi Li
    • 1
  • Shan Wang
    • 1
  • Xiaoyong Du
    • 1
  1. 1.School of Information and Key Laboratory of Data Engineering and Knowledge Engineering, MOERenmin University of ChinaBeijingP.R. China

Personalised recommendations