Skip to main content
Log in

Querying and ranking incomplete twigs in probabilistic XML

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As the next generation language of the Internet, XML has been the de-facto standard of information exchange over the web. A core operation for XML query processing is to find all the occurrences of a twig pattern in an XML database. In addition, the study of probabilistic data has become an emerging topic for various applications on the Web. Therefore, researching the combination of XML twig pattern and probabilistic data is quite significant. In prior work of probabilistic XML, the answers of a given twig query are always complete. However, complete answers with low probabilities may be deemed irrelevant while incomplete answers with high probabilities are of great significance because incomplete answers may be the potential answers that interest the users. Different from complete evaluation, evaluating incomplete twigs in probabilistic XML introduces some new challenges. On one hand, incomplete queries do not only obtain complete matches, but also return answers that contain considerable incomplete matches. On the other hand, the processing of incomplete evaluation is more complicated. It is obvious that a ranking approach should be adopted along with evaluating incomplete answers. In this paper, we propose an efficient algorithm to handle the problem of querying incomplete twigs over the probabilistic XML database. We also present a novel algorithm for ranking the incomplete answers. The experimental results show that our proposed algorithms can improve the performance of querying and ranking incomplete twigs significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abiteboul, S., Segoufin, L., Vianu, V.: Representing and Querying XML with Incomplete Information, In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 150–161 (2001)

  2. Abiteboul, S., Senellart, P.: Querying and Updating Probabilistic Information in XML, In Proceedings of the 10th International Conference on Extending Database Technology, 1059–1068 (2006)

  3. Agrawal, S., et al.: Automated ranking of database query results. ACM Trans. Database Syst. 28(2), 140–174 (2003)

    Article  Google Scholar 

  4. Al-Khalifa, S., et al.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching, In Proceedings of 18th International Conference on Data Engineering, 141–152 (2002)

  5. Amer-Yahia, S., et al.: Structure and Content Scoring for XML, In Proceedings of the 31rd International Conference on Vary large Data Bases, 361–372 (2005)

  6. Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching, In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 310–321 (2002)

  7. Chang, L., Yu, J., Qin, L.: Query Ranking in Probabilistic XML Data, In Proceedings of the 12th International Conference on Extending Database Technology, 156–167 (2009)

  8. Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data, In Proceedings of 24th International Conference on Data Engineering, 1403–1405 (2008)

  9. Hung, E., Getoor, L., Subrahmanian, V. S.: PXML: A Probabilistic Semistructured Data Model and Algebra, In Proceedings of 19th International Conference on Data Engineering, 467–478 (2003)

  10. Kanza, Y., Nutt, W., Sagiv, Y.: Querying incomplete information in semistructured data. J Comput Syst Sci 64, 655–693 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kimelfeld, B., Kosharovshy, Y., Sagiv, Y.: Query Efficiency in Probabilistic XML Models, In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 701–714 (2008)

  12. Kimelfeld, B., Sagiv, Y.: Combining Incompleteness and Ranking in Tree Queries, In Proceedings of the 11th International Conference on Database Theory, 2007, pp 329–343

  13. Kimelfeld, B., Sagiv, Y.: Matching Twigs in Probabilistic XML, In Proceedings of the 33rd International Conference on Vary large Data Bases, 27–38 (2007)

  14. Li, Y., et al.: Holistically Twig Matching in Probabilistic XML, In Proceedings of 25th International Conference on Data Engineering, 1649–1656 (2009)

  15. Liu, J., Ma, Z. M., Yan, L.: Efficient Processing of Twig Pattern Matching in Fuzzy XML. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, 193–204 (2009)

  16. Lu, J., et al.: From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching, In Proceedings of the 31st International Conference on Vary large Data Bases, 193–204 (2005)

  17. Meng, X., Ma, Z. M., Yan, L.: Answering Approximate Queries over Autonomous Web Databases, In Proceedings of the 18th international conference on World Wide Web, 1021–1030 (2009)

  18. Nierrman, A., Jagadish, H. V.: ProTDB: Probabilistic Data in XML, In Proceedings of the 28th International Conference on Vary large Data Bases, 646–657 (2002)

  19. Re, C. et al.: Efficient Top-k Query Evaluation on Probabilistic Data, In Proceedings of 23th International Conference on Data Engineering, 886–895 (2007)

  20. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGrwa-Hill, (1983)

  21. Senellart, P., Abiteboul, S.: On the Complexity of Managing Probabilistic XML Data, In Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 283–292 (2007)

  22. Turowski, K., Weng, U.: Representing and Processing Fuzzy Information an XML-based Approach. J Knowl Base Syst 15, 67–75 (2002)

    Article  Google Scholar 

  23. University of Washington XML Repository, Available from http://www.cs.washington.edu/research/xmldatasets/

  24. Wang, J., Yu, J.X., Liu, C.: Independence of containing patterns property and its application in tree pattern query rewriting using views. World Wide Web Journal 12(1), 87–105 (2009)

    Article  Google Scholar 

  25. Wu, X., Theodorators, D., Souldators, S., et al.: Evaluation techniques for generalized path pattern queries on XML data. World Wide Web Journal 13(4), 441–474 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Z. M. Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Ma, Z.M. & Yan, L. Querying and ranking incomplete twigs in probabilistic XML. World Wide Web 16, 325–353 (2013). https://doi.org/10.1007/s11280-011-0149-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0149-x

Keywords

Navigation