Skip to main content
Log in

A multi-ranker model for adaptive XML searching

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The evolution of computing technology suggests that it has become more feasible to offer access to Web information in a ubiquitous way, through various kinds of interaction devices such as PCs, laptops, palmtops, and so on. As XML has become a de-facto standard for exchanging Web data, an interesting and practical research problem is the development of models and techniques to satisfy various needs and preferences in searching XML data. In this paper, we employ a list of simple XML tagged keywords as a vehicle for searching XML fragments in a collection of XML documents. In order to deal with the diversified nature of XML documents as well as user preferences, we propose a novel multi-ranker model (MRM), which is able to abstract a spectrum of important XML properties and adapt the features to different XML search needs. The MRM is composed of three ranking levels. The lowest level consists of two categories of similarity and granularity features. At the intermediate level, we define four tailored XML rankers (XRs), which consist of different lower level features and have different strengths in searching XML fragments. The XRs are trained via a learning mechanism called the Ranking Support Vector Machine in a voting Spy Naïve Bayes framework (RSSF). The RSSF takes as input a set of labeled fragments and feature vectors and generates as output Adaptive Rankers (ARs) in the learning process. The ARs are defined over the XRs and generated at the top level of the MRM. We show empirically that the RSSF is able to improve the MRM significantly in the learning process that needs only a small set of training XML fragments. We demonstrate that the trained MRM is able to bring out the strengths of the XRs in order to adapt different preferences and queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior. In: Proc. of SIGIR (2006)

  2. Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: Proc. of SIGIR (2006)

  3. Amer-Yahia, S., Lakshmanan, L., Shashank, P.: FleXPath: flexible structure and full-text querying for XML. In: Proc. of SIGMOD (2004)

  4. Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: a fulltext search extension to XQuery. In Proc. of WWW (2004)

  5. Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for XML. In: Proc. of VLDB (2005)

  6. Amer-Yahia, S., Curtmola, E., Deutsch A.: Flexible and efficient XML search with complex full-text predicates. In: Proc. of SIGMOD (2006)

  7. Amer-Yahia, S., Lalmas, M.: XML Search: Languages, INEX and Scoring. In: SIGMOD Record, vol. 35, No. 4, December (2006)

  8. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison–Wesley Reading/London Longman (1999)

  9. Bennet K. and Demiriz A. (1998). Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 11: 368–374

    Google Scholar 

  10. Bosak, J.: Shakespeare in XML. http://www.ibiblio.org/xml/ examples/shakespeare/ (2004)

  11. Cao, Y., Xu, J., Liu, T., Li, H., Huang, Y., Hon, H.: Adapting ranking SVM to document retrieval. In: Proc. of SIGIR (2006)

  12. Carmel, D., Marrek, Y.S., Mandelbrodand, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: Proc. SIGIR (2003)

  13. Chai, X., Deng, L., Yang, Q., Ling, C.X.: Test-cost sensitive naive bayes classification. In: Proc. of ICDM (2004)

  14. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A semantic search engine for XML. In: Proc. of VLDB (2003)

  15. Cohen S., Shapire R. and Singer Y. (1999). Learning to order things. J. Artif. Intel. Res. 10: 243–270

    MATH  Google Scholar 

  16. Crammer, K., Singer, Y.: Pranking with ranking. In: Proc. of NIPS (2001)

  17. Deng, L., Chai, X., Tan, Q., Ng, W., Lee, D.L.: Spying out real user preferences for metasearch engine personalization. In: Proc. of ACM WEBKDD (2004)

  18. Denoyer, L., Gallinari, P.: The wikipedia XML corpus. In: SIGIR Forum (2006)

  19. Fuhr N. (1989). Optimum Polynomial retrieval functions based on the probability ranking principle. ACM Trans. Info. Syst. 7(3): 183–204

    Article  Google Scholar 

  20. Fuhr N. (1992). Probabilistic models in information retrieval. Comput. J. 35(3): 243–255

    Article  MATH  Google Scholar 

  21. Full List of Queries Used in Experiments. http://www.cse.ust.hk/∼ wilfred/mrm/ (2007)

  22. Granka, L., Joachims, T., Gay, G.: Eye-tracking analysis of user behavior in WWW-search. In: Proc. of SIGIR (2004)

  23. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proc. of SIGMOD (2003)

  24. Halverson, A. et al.: Mixed mode XML query processing. In: Proc. of VLDB (2003)

  25. Herbrich R., Graepel T. and Obermayer K. (2000). Large margin rank boundaries for ordinal regression. In: Smdar, A. (eds) Advances in Large Margin Classifiers., pp 115–132. MIT press, Cambridge

    Google Scholar 

  26. Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of SIGKDD (2002)

  27. Joachims, T., Granka, L., Pan, B.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. of SIGIR (2005)

  28. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. (TOIS) 25(2), (2007)

  29. Kemp, C., Ramamohanarao, K.: Long-term learning for web search engines. In: T. E. et al., (eds) PKDD, pp. 263–274 (2003)

  30. Kurose J. and Ross K. (2004). Computer Networks: A Top Down Approach Featuring Internet, 3rd edn. Addison Wesley, Reading

    Google Scholar 

  31. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: Proc. of VLDB, 2004

  32. Mass, Y., Mandelbrod, M.: Relevance feedback for XML retrieval. In: Proc. of INEX (2004)

  33. Mass, Y., Mandelbrod, M.: Using the INEX environment as a test bed for various user models for XML retrieval. In: Proc. of INEX (2005)

  34. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proc. of AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

  35. Mihajlovic, V., Ramirez, G., de Vries, A.P., Hiemstra, D.: TIJAH at INEX 2004 modeling phrases and relevance feedback. In: Proc. of INEX (2004)

  36. Miklau, G.: UW XML Repository. http://www.cs.washington.edu/ research/xmldatasets/ (2006)

  37. Mitchell T. (1997). Machine Learning. McGraw Hill, New York

    MATH  Google Scholar 

  38. NIST. Common Evaluation Measures. appendix in Special Publication 500-250 (TREC 2001), NIST, Gaithersburg (2001)

  39. Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project (1998)

  40. Radlinski, F., Joachims, T.: Query Chains: learning to rank from implicit feedback. In: Proc. of KDD (2005)

  41. Radlinski, F., Joachims, T.: Evaluating the robustness of learning from implicit feedback. In: Proc. of ICML (2005)

  42. Rajaram, S., Garg, A., Zhou, Z.S., Huang, T.S.: Classification Approach Towards Ranking and Sorting Problems. In: Lecture Notes in Artificial Intelligence, vol. 2837, pp. 301–312, September (2003)

  43. Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev. 18(1), (2003)

  44. SVM light. Support Vector Machine. http://www.svmlight. joachims.org/, Ver 6.01 (2004)

  45. Schenkel, R., Theoblad, M.: Feedback-driven structural query expansion for ranked retrieval of XML data. In: Proc. of EDBT (2006)

  46. Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Proc. of VLDB (2005)

  47. Theobald, M., Schenkel, R., Weikum, G.: TopX Search. http://www.infao5501.ag5.mpi-sb.mpg.de:8080/topx

  48. Trotman, A., Sigurbjornsson, B.: Narrowed extended XPath I (NEXI). In: Proc. of INEX (2004)

  49. Trotman, A., Lalmas, M.: The interpretation of CAS. In: Proc. of INEX (2005)

  50. Trotman, A., Lalmas, M., Fuhr, N.: INitiative for the Evaluation of XML Retrieval (INEX). http://inex.is.informatik.uni- duisburg.de/2007/index.html

  51. World Wide Web Consortium. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/, W3C Working Draft 22 August (2003)

  52. World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Full-Text. http://www.w3.org/TR/2005/WD-xquery-full-text-20050404/, W3C Working Draft 4 April (2005)

  53. XML SQL Utility in Oracle. http://www.oracle.com/index.html (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ho Lam Lau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lau, H.L., Ng, W. A multi-ranker model for adaptive XML searching. The VLDB Journal 17, 57–80 (2008). https://doi.org/10.1007/s00778-007-0068-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0068-8

Keywords

Navigation