Superimposed Code-Based Indexing Method for Extracting MCTs from XML Documents

  • Wenxin Liang
  • Takeshi Miki
  • Haruo Yokota
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5181)

Abstract

With the exponential increase in the amount of XML data on the Internet, information retrieval techniques on tree-structured XML documents such as keyword search become important. The search results for this retrieval technique are often represented by minimum connecting trees (MCTs) rooted at the lowest common ancestors (LCAs) of the nodes containing all the search keywords. Recently, effective methods such as the stack-based algorithm for generating the lowest grouped distance MCTs (GDMCTs), which derive a more compact representation of the query results, have been proposed. However, when the XML documents and the number of search keywords become large, these methods are still expensive. To achieve more efficient algorithms for extracting MCTs, especially lowest GDMCTs, we first consider two straightforward LCA detection methods: keyword B + trees with Dewey-order labels and superimposed code-based indexing methods. Then, we propose a method for efficiently detecting the LCAs, which combines the two straightforward indexing methods for LCA detection. We also present an effective solution for the false drop problem caused by the superimposed code. Finally, the proposed LCA detection methods are applied to generate the lowest GDMCTs. We conduct detailed experiments to evaluate the benefits of our proposed algorithms and show that the proposed combined method can completely solve the false drop problem and outperforms the stack-based algorithm in extracting the lowest GDMCTs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A System for Keyword-Based Search over Relational Databases. In: ICDE, pp. 5–16 (2002)Google Scholar
  2. 2.
    Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: Enabling Keyword Search over Relational Databases. In: SIGMOD, p. 627 (2002)Google Scholar
  3. 3.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: On Finding Lowest Common Ancestors in Trees. In: STOC, pp. 253–265 (1973)Google Scholar
  4. 4.
    Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword Searching and Browsing in Databases using BANKS. In: ICDE, pp. 431–440 (2002)Google Scholar
  5. 5.
    Clementi, A.E.F., Monti, A., Silvestri, R.: Selective Families, Superimposed Codes, and Broadcasting on Unknown Radio Networks. In: SODA, pp. 709–718 (2001)Google Scholar
  6. 6.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB, pp. 45–56 (2003)Google Scholar
  7. 7.
    Dyer, M., Fenner, T., Frieze, A., Thomason, A.: On Key Storage in Secure Networks. J. of Cryptology 8(4), 189–200 (1995)MATHCrossRefGoogle Scholar
  8. 8.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD, pp. 16–27 (2003)Google Scholar
  9. 9.
    Harel, D., Tarjan, R.E.: Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hristidis, V., Koudas, N.: Keyword Proximity Search in XML Trees. IEEE TKDE 18(4), 525–539 (2006)Google Scholar
  11. 11.
    Hristidis, V., Papakonstantinou, Y.: DISCOVER: Keyword Search in Relational Databases. In: VLDB, pp. 670–681 (2002)Google Scholar
  12. 12.
    Kaae, R., Nguyen, T.-D., Nørgaard, D., Schmidt, A.: Kalchas: A Dynamic XML Search Engine. In: CIKM, pp. 541–548 (2005)Google Scholar
  13. 13.
    Kautz, W.H., Singleton, R.C.: Nonrandom Binary Superimposed Codes. IEICE Trans. Inform. Theory 10(4), 363–377 (1964)MATHCrossRefGoogle Scholar
  14. 14.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: VLDB, pp. 72–83 (2004)Google Scholar
  15. 15.
    Nykänen, M., Ukkonen, E.: Finding Lowest Common Ancestors in Arbitrarily Directed Trees. Inf. Process. Lett. 50(6), 307–310 (1994)MATHCrossRefGoogle Scholar
  16. 16.
    XML Benchmark Project, http://www.xml-benchmark.org
  17. 17.
    Schieber, B., Vishkin, U.: On Finding Lowest Common Ancestors: Simplification and Parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: ICDE, pp. 321–329 (2001)Google Scholar
  19. 19.
    Stinson, W.D.R., van Trung, T., Wei, R.: Secure Frameproof Codes, Key Distribution Patterns, Group Testing Algorithms and Related Structures. J. of Statistical Planning and Inference 86, 595–617 (2000)MATHCrossRefGoogle Scholar
  20. 20.
  21. 21.
    Tarjan, R.E.: Applications of Path Compression on Balanced Trees. J. ACM 26(4), 690–715 (1979)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
  23. 23.
    The Free Encyclopedia: Wikipedia, http://www.wikipedia.org/
  24. 24.
    XML Version of DBLP, http://dblp.uni-trier.de/xml/
  25. 25.
    Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD, pp. 537–538 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Wenxin Liang
    • 1
    • 4
  • Takeshi Miki
    • 2
  • Haruo Yokota
    • 3
    • 4
  1. 1.CRESTJapan Science and Technology Agency (JST) 
  2. 2.Nomura Research Institute 
  3. 3.Department of Computer ScienceTokyo Institute of Technology 
  4. 4.Global Scientific Information and Computing CenterTokyo Institute of Technology 

Personalised recommendations