A multi-ranker model for adaptive XML searching

Lau, Ho Lam; Ng, Wilfred

doi:10.1007/s00778-007-0068-8

A multi-ranker model for adaptive XML searching

Special Issue Paper
Published: 01 September 2007

Volume 17, pages 57–80, (2008)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ho Lam Lau¹ &
Wilfred Ng¹

103 Accesses
2 Citations
Explore all metrics

Abstract

The evolution of computing technology suggests that it has become more feasible to offer access to Web information in a ubiquitous way, through various kinds of interaction devices such as PCs, laptops, palmtops, and so on. As XML has become a de-facto standard for exchanging Web data, an interesting and practical research problem is the development of models and techniques to satisfy various needs and preferences in searching XML data. In this paper, we employ a list of simple XML tagged keywords as a vehicle for searching XML fragments in a collection of XML documents. In order to deal with the diversified nature of XML documents as well as user preferences, we propose a novel multi-ranker model (MRM), which is able to abstract a spectrum of important XML properties and adapt the features to different XML search needs. The MRM is composed of three ranking levels. The lowest level consists of two categories of similarity and granularity features. At the intermediate level, we define four tailored XML rankers (XRs), which consist of different lower level features and have different strengths in searching XML fragments. The XRs are trained via a learning mechanism called the Ranking Support Vector Machine in a voting Spy Naïve Bayes framework (RSSF). The RSSF takes as input a set of labeled fragments and feature vectors and generates as output Adaptive Rankers (ARs) in the learning process. The ARs are defined over the XRs and generated at the top level of the MRM. We show empirically that the RSSF is able to improve the MRM significantly in the learning process that needs only a small set of training XML fragments. We demonstrate that the trained MRM is able to bring out the strengths of the XRs in order to adapt different preferences and queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior. In: Proc. of SIGIR (2006)
Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: Proc. of SIGIR (2006)
Amer-Yahia, S., Lakshmanan, L., Shashank, P.: FleXPath: flexible structure and full-text querying for XML. In: Proc. of SIGMOD (2004)
Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: a fulltext search extension to XQuery. In Proc. of WWW (2004)
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for XML. In: Proc. of VLDB (2005)
Amer-Yahia, S., Curtmola, E., Deutsch A.: Flexible and efficient XML search with complex full-text predicates. In: Proc. of SIGMOD (2006)
Amer-Yahia, S., Lalmas, M.: XML Search: Languages, INEX and Scoring. In: SIGMOD Record, vol. 35, No. 4, December (2006)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison–Wesley Reading/London Longman (1999)
Bennet K. and Demiriz A. (1998). Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 11: 368–374
Google Scholar
Bosak, J.: Shakespeare in XML. http://www.ibiblio.org/xml/ examples/shakespeare/ (2004)
Cao, Y., Xu, J., Liu, T., Li, H., Huang, Y., Hon, H.: Adapting ranking SVM to document retrieval. In: Proc. of SIGIR (2006)
Carmel, D., Marrek, Y.S., Mandelbrodand, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: Proc. SIGIR (2003)
Chai, X., Deng, L., Yang, Q., Ling, C.X.: Test-cost sensitive naive bayes classification. In: Proc. of ICDM (2004)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A semantic search engine for XML. In: Proc. of VLDB (2003)
Cohen S., Shapire R. and Singer Y. (1999). Learning to order things. J. Artif. Intel. Res. 10: 243–270
MATH Google Scholar
Crammer, K., Singer, Y.: Pranking with ranking. In: Proc. of NIPS (2001)
Deng, L., Chai, X., Tan, Q., Ng, W., Lee, D.L.: Spying out real user preferences for metasearch engine personalization. In: Proc. of ACM WEBKDD (2004)
Denoyer, L., Gallinari, P.: The wikipedia XML corpus. In: SIGIR Forum (2006)
Fuhr N. (1989). Optimum Polynomial retrieval functions based on the probability ranking principle. ACM Trans. Info. Syst. 7(3): 183–204
Article Google Scholar
Fuhr N. (1992). Probabilistic models in information retrieval. Comput. J. 35(3): 243–255
Article MATH Google Scholar
Full List of Queries Used in Experiments. http://www.cse.ust.hk/∼ wilfred/mrm/ (2007)
Granka, L., Joachims, T., Gay, G.: Eye-tracking analysis of user behavior in WWW-search. In: Proc. of SIGIR (2004)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proc. of SIGMOD (2003)
Halverson, A. et al.: Mixed mode XML query processing. In: Proc. of VLDB (2003)
Herbrich R., Graepel T. and Obermayer K. (2000). Large margin rank boundaries for ordinal regression. In: Smdar, A. (eds) Advances in Large Margin Classifiers., pp 115–132. MIT press, Cambridge
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of SIGKDD (2002)
Joachims, T., Granka, L., Pan, B.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. of SIGIR (2005)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. (TOIS) 25(2), (2007)
Kemp, C., Ramamohanarao, K.: Long-term learning for web search engines. In: T. E. et al., (eds) PKDD, pp. 263–274 (2003)
Kurose J. and Ross K. (2004). Computer Networks: A Top Down Approach Featuring Internet, 3rd edn. Addison Wesley, Reading
Google Scholar
Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: Proc. of VLDB, 2004
Mass, Y., Mandelbrod, M.: Relevance feedback for XML retrieval. In: Proc. of INEX (2004)
Mass, Y., Mandelbrod, M.: Using the INEX environment as a test bed for various user models for XML retrieval. In: Proc. of INEX (2005)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proc. of AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
Mihajlovic, V., Ramirez, G., de Vries, A.P., Hiemstra, D.: TIJAH at INEX 2004 modeling phrases and relevance feedback. In: Proc. of INEX (2004)
Miklau, G.: UW XML Repository. http://www.cs.washington.edu/ research/xmldatasets/ (2006)
Mitchell T. (1997). Machine Learning. McGraw Hill, New York
MATH Google Scholar
NIST. Common Evaluation Measures. appendix in Special Publication 500-250 (TREC 2001), NIST, Gaithersburg (2001)
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project (1998)
Radlinski, F., Joachims, T.: Query Chains: learning to rank from implicit feedback. In: Proc. of KDD (2005)
Radlinski, F., Joachims, T.: Evaluating the robustness of learning from implicit feedback. In: Proc. of ICML (2005)
Rajaram, S., Garg, A., Zhou, Z.S., Huang, T.S.: Classification Approach Towards Ranking and Sorting Problems. In: Lecture Notes in Artificial Intelligence, vol. 2837, pp. 301–312, September (2003)
Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev. 18(1), (2003)
SVM ^light. Support Vector Machine. http://www.svmlight. joachims.org/, Ver 6.01 (2004)
Schenkel, R., Theoblad, M.: Feedback-driven structural query expansion for ranked retrieval of XML data. In: Proc. of EDBT (2006)
Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Proc. of VLDB (2005)
Theobald, M., Schenkel, R., Weikum, G.: TopX Search. http://www.infao5501.ag5.mpi-sb.mpg.de:8080/topx
Trotman, A., Sigurbjornsson, B.: Narrowed extended XPath I (NEXI). In: Proc. of INEX (2004)
Trotman, A., Lalmas, M.: The interpretation of CAS. In: Proc. of INEX (2005)
Trotman, A., Lalmas, M., Fuhr, N.: INitiative for the Evaluation of XML Retrieval (INEX). http://inex.is.informatik.uni- duisburg.de/2007/index.html
World Wide Web Consortium. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/, W3C Working Draft 22 August (2003)
World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Full-Text. http://www.w3.org/TR/2005/WD-xquery-full-text-20050404/, W3C Working Draft 4 April (2005)
XML SQL Utility in Oracle. http://www.oracle.com/index.html (2004)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Ho Lam Lau & Wilfred Ng

Authors

Ho Lam Lau
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Ng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ho Lam Lau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lau, H.L., Ng, W. A multi-ranker model for adaptive XML searching. The VLDB Journal 17, 57–80 (2008). https://doi.org/10.1007/s00778-007-0068-8

Download citation

Received: 20 September 2006
Revised: 25 June 2007
Accepted: 08 July 2007
Published: 01 September 2007
Issue Date: January 2008
DOI: https://doi.org/10.1007/s00778-007-0068-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-ranker model for adaptive XML searching

Abstract

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Toward an intelligent tourism recommendation system based on artificial intelligence and IoT using Apriori algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-ranker model for adaptive XML searching

Abstract

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Toward an intelligent tourism recommendation system based on artificial intelligence and IoT using Apriori algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation