World Wide Web

, Volume 16, Issue 2, pp 171–193 | Cite as

ELCA evaluation for keyword search on probabilistic XML data

  • Rui ZhouEmail author
  • Chengfei Liu
  • Jianxin Li
  • Jeffrey Xu Yu


As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability.


ELCA probabilistic XML keyword search keyword query uncertain 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic xml models. VLDB J. 18(5), 1041–1064 (2009)CrossRefGoogle Scholar
  2. 2.
    Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in xml. In: EDBT, pp. 1059–1068 (2006)Google Scholar
  3. 3.
    Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)Google Scholar
  4. 4.
    Chang, L., Yu, J.X., Qin, L.: Query ranking in probabilistic xml data. In: EDBT, pp. 156–167 (2009)Google Scholar
  5. 5.
    Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic xml. ACM Trans. Database Syst. 34(3), (2009)Google Scholar
  6. 6.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB, pp. 45–56 (2003)Google Scholar
  7. 7.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD Conference, pp. 16–27 (2003)Google Scholar
  8. 8.
    Hung, E., Getoor, L., Subrahmanian, V.S.: Pxml: a probabilistic semistructured data model and algebra. In: ICDE, pp. 467–478 (2003)Google Scholar
  9. 9.
    Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval xml. ACM Trans. Comput. Log. 8(4), (2007)Google Scholar
  10. 10.
    Kimelfeld, B., Kosharovsky, Y., Sagiv, Y.: Query efficiency in probabilistic xml models. In: SIGMOD Conference, pp. 701–714 (2008)Google Scholar
  11. 11.
    Kimelfeld, B., Kosharovsky, Y., Sagiv, Y.: Query evaluation over probabilistic xml. VLDB J. 18(5), 1117–1140 (2009)CrossRefGoogle Scholar
  12. 12.
    Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic xml. In: VLDB, pp. 27–38 (2007)Google Scholar
  13. 13.
    Kimelfeld, B., Senellart, P.: Probabilistic XML: models and complexity. (2011, preprint)
  14. 14.
    Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)Google Scholar
  15. 15.
    Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)Google Scholar
  16. 16.
    Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of promising result types for xml keyword search. In: EDBT, pp. 561–572 (2010)Google Scholar
  17. 17.
    Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: ICDE, pp. 673–684 (2011)Google Scholar
  18. 18.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: VLDB, pp. 72–83 (2004)Google Scholar
  19. 19.
    Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)Google Scholar
  20. 20.
    Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1(1), 921–932 (2008)Google Scholar
  21. 21.
    Liu, Z., Chen, Y.: Processing keyword search on xml: a survey. World Wide Web 14(5–6), 671–707 (2011)CrossRefGoogle Scholar
  22. 22.
    Nierman, A., Jagadish, H.V.: ProTDB: probabilistic data in xml. In: VLDB, pp. 646–657 (2002)Google Scholar
  23. 23.
    Ning, B., Liu, C., Yu, J.X., Wang, G., Li, J.: Matching top-k answers of twig patterns in probabilistic xml. In: DASFAA (1), pp. 125–139 (2010)Google Scholar
  24. 24.
    Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic xml data. In: PODS, pp. 283–292 (2007)Google Scholar
  25. 25.
    Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)Google Scholar
  26. 26.
    Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered xml using a relational database system. In: SIGMOD Conference, pp. 204–215 (2002)Google Scholar
  27. 27.
    van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic xml approach to data integration. In: ICDE, pp. 459–470 (2005)Google Scholar
  28. 28.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: SIGMOD Conference, pp. 537–538 (2005)Google Scholar
  29. 29.
    Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: EDBT, pp. 535–546 (2008)Google Scholar
  30. 30.
    Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: EDBT, pp. 549–560 (2010)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Rui Zhou
    • 1
    Email author
  • Chengfei Liu
    • 1
  • Jianxin Li
    • 1
  • Jeffrey Xu Yu
    • 2
  1. 1.Faculty of Information & Communication TechnologiesSwinburne University of TechnologySwinburneAustralia
  2. 2.Department of Systems Engineering & Engineering ManagementThe Chinese University of Hong KongHong KongChina

Personalised recommendations