Efficient Keyword Search over Data-Centric XML Documents

  • Guoliang Li
  • Jianhua Feng
  • Na Ta
  • Lizhu Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4505)


We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture different structural information of the XML document. We then propose the meaningful self-integrated trees, which contain all the keywords and describe how the keywords are interrelated, to answer keyword search over XML documents. In addition, we introduce the B  + -tree index to accelerate the retrieval of those meaningful self-integrated trees. Moreover, to further enhance the performance of keyword search, we present Bloom Filter to improve the efficiency of generating those meaningful self-integrated trees. Finally, we conducted extensive experiments to evaluate the performance of our method, and the experimental results demonstrate that our method achieves high efficiency and outperforms the existing approaches significantly.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
    Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)Google Scholar
  4. 4.
    Amer-Yahia, S., Botev, C., Dorre, J., Shanmugasundaram, J.: Xquery full-text extensions explained. IBM Systems Journal 45(2), 335–352 (2006)CrossRefGoogle Scholar
  5. 5.
    Amer-Yahia, S., Curtmola, E., Deutsch, A.: Flexible and efficient xml search with complex full-text predicates. In: SIGMOD, pp. 575–586 (2006)Google Scholar
  6. 6.
    Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for xml. In: VLDB, pp. 361–372 (2005)Google Scholar
  7. 7.
    Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)Google Scholar
  8. 8.
    Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of ACM 13(7), 422–426 (1970)zbMATHCrossRefGoogle Scholar
  9. 9.
    Botev, C., Amer-Yahia, S., Shanmugasundaram, J.: Expressiveness and Performance of Full-Text Search Languages. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 349–367. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)Google Scholar
  11. 11.
    Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: CIKM, pp. 389–396 (2005)Google Scholar
  12. 12.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: A semantic search engine for xml. In: VLDB, pp. 45–56 (2003)Google Scholar
  13. 13.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD, pp. 16–27 (2003)Google Scholar
  14. 14.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)Google Scholar
  16. 16.
    Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. IEEE Trans. Knowl. Data Eng. 18(4) (2006)Google Scholar
  17. 17.
    Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)Google Scholar
  18. 18.
    Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)Google Scholar
  19. 19.
    Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)Google Scholar
  20. 20.
    Li, Y., Yang, H., Jagadish, H.V.: Nalix: an interactive natural language interface for querying xml. In: SIGMOD, pp. 900–902 (2005)Google Scholar
  21. 21.
    Li, Y., Yang, H., Jagadish, H.V.: Constructing a Generic Natural Language Interface for an XML Database. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 737–754. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–84 (2004)Google Scholar
  23. 23.
    Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD, pp. 563–574 (2006)Google Scholar
  24. 24.
    Marais, J., Bharat, K.: Supporting cooperative and personal surfing with a desktop assistant. In: ACM UIST, ACM Press, New York (1997)Google Scholar
  25. 25.
    Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)Google Scholar
  26. 26.
    Pradhan, S.: An algebraic query model for effective and efficient retrieval of xml fragments. In: VLDB, pp. 295–306 (2006)Google Scholar
  27. 27.
    Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD, pp. 527–538 (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jianhua Feng
    • 1
  • Na Ta
    • 1
  • Lizhu Zhou
    • 1
  1. 1.Department of Computer Science and Technology, Tsinghua University, Beijing 100084P.R. China

Personalised recommendations