No Tag, a Little Nesting, and Great XML Keyword Search

  • Lingbo Kong
  • Shiwei Tang
  • Dongqing Yang
  • Tengjiao Wang
  • Jun Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)


Keyword search from Informational Retrieval (IR) can be seen as one most convenient processing mode catering for common users to obtain interesting information. As XML data becomes more and more widespread, the trend of adapting keyword search on XML data also becomes more and more active. In this paper, we first try nesting mechanism for XML keyword search, which just uses a little nesting skill. This attempt has several benefits. For example, it is convenient for common users, because they need not to know any organization knowledge of the target XML data. Secondly, the nesting pattern can be easily transformed into structural hints, which has same mechanism as what XML data model does. Finally, since there is no need of label information, we can retrieve XML fragments from different schemas. Besides, this paper also proposes a new similarity measuring method for retrieved XML fragments which can be from different schemas. Its kernel is KCAM (Keyword Common Ancestor Matrix) structure, which stores the level information of SLCA (Smallest Lowest Common Ancestor) node between two keywords. By mapping XML fragments into KCAMs, the structural similarity can be computed using matrix distance. KCAM distance can go well with the nesting keyword method.


Face Recognition Leaf Node Very Large Data Base Label Path Keyword Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison, Wisconsin, June 3-6, pp. 204–215 (2002)Google Scholar
  2. 2.
    Clark, J., DeRose, S.: XML Path Language(XPath) version 1.0 w3c recommendation. World Wide Web Consortium (November 1999)Google Scholar
  3. 3.
    Chamberlin.D, et al.: XQuery: A Query Language for XML W3C working draft. Technical Report WD-xquery-20010215, World Wide Web Consortium (February 2001)Google Scholar
  4. 4.
    Schmidt, A., Kersten, L.M., Windhouwer, M.: Querying XML documents made easy: Nearest concept queries. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), pp. 321–329 (April 2001)Google Scholar
  5. 5.
    Guo, L., et al.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003, June 9-12 (2003)Google Scholar
  6. 6.
    Cohen, S., et al.: Xsearch: A semantic search engine for XML. In: Proceedings of the 29th VLDB Conference, September 9-12, pp. 33–44 (2003)Google Scholar
  7. 7.
    Weigel, F., et al.: Content and Structure in Indexing and Ranking XML. WebDB (2004)Google Scholar
  8. 8.
    Botev, C., Shanmugasundaram, J.: Context-Sensitive Keyword Search and Ranking for XML. In: Eighth International Workshop on the Web and Databases (WebDB 2005), June 16-17 (2005)Google Scholar
  9. 9.
    Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: ACM SIGMOD 2005, June 14-16 (2005)Google Scholar
  10. 10.
    Schlieder, T., Meuss, H.: Result ranking for structured queries against XML documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)Google Scholar
  11. 11.
    Guha, S., et al.: Approximate XML Joins. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), June 3-6 (2002)Google Scholar
  12. 12.
    Yu, C., Qi, H., Jagadish, V.H.: Integration of IR into an XML Database. In: INEX Workshop, pp. 162–169 (2002)Google Scholar
  13. 13.
    Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 - September 2, pp. 361–372 (2005)Google Scholar
  15. 15.
    Yang, R., Kalnis, P., Tung, K.A.: Similarity Evaluation on Tree-structured Data. In: ACM SIGMOD Conference, June 13-16 (2005)Google Scholar
  16. 16.
    Augsten, N., Böhlen, H.M., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 - September 2, pp. 301–312 (2005)Google Scholar
  17. 17.
    Joshi, S., et al.: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. In: SIGKDD 2003, August 24-27 (2003)Google Scholar
  18. 18.
    Carmel, D., et al.: Searching XML Documents via XML Fragments. In: SIGIR 2003, July 28-August 1 (2003)Google Scholar
  19. 19.
    Wolff, E.J., Flörke, H., Cremers, B.A.: XPRES: A ranking approach to retrieval on structured documents. University of Bonn. Technical Report IAI-TR-99- 12 (1999)Google Scholar
  20. 20.
    Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: WWW (2000)Google Scholar
  21. 21.
    Fuhr, N., Großjohann, K.: XIRQL: A query language for information retrieval in XML documents. In: International Conference on Information Retrieval, SIGIR (2001)Google Scholar
  22. 22.
    Bremer, M.J., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: WebDB (2002)Google Scholar
  23. 23.
    Chinenyanga, T.T., Kushmerick, N.: An expressive and efficient language for XML information retrieval. Journal of the American Society for Information Science and Technology (JASIST) 53(6), 438–453 (2002)CrossRefGoogle Scholar
  24. 24.
    Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  25. 25.
    Al-Khalifa, S., Yu, C., Jagadish, V.H.: Querying Structured Text in an XML Database. In: SIGMOD 2003, June 9-12 (2003)Google Scholar
  26. 26.
    Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: A FullText Search Extension to XQuery. In: Proceedings of the 13th conference on World Wide Web, May 17-22, pp. 583–594 (2004)Google Scholar
  27. 27.
    Amer-Yahia, S., Lakshmanan, V.L., Pandit, S.: FleXPath: Flexible Structure and Full- Text Querying for XML. In: SIGMOD 2004, June 13-18 (2004)Google Scholar
  28. 28.
    Curtmola, E., et al.: GalaTex: A Conformant Implementation of the XQuery FullText Language. In: Informal Proceedings of the Second International Workshop on XQuery Implementation, Experience, and Perspectives (XIME-P), June 16-17 (2005)Google Scholar
  29. 29.
    Wolff, E.J., Flörke, H., Cremers, B.A.: Searching and browsing collections of structural information. In: Proceedings of IEEE Advances in Digital Libraries (ADL 2000), pp. 141–150 (May 2000)Google Scholar
  30. 30.
    Woodley, A., Geva, S.: NLPX - An XML-IR System with a Natural Language Interface. In: Proceedings of the 9th Australian Document Computing Symposium, December 13 (2004)Google Scholar
  31. 31.
    Zhang, K.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms. Oxford University, Oxford (1997)Google Scholar
  33. 33.
    Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)MATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)Google Scholar
  35. 35.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, pp. 19–73. Pearson Education Limited, London (1999)Google Scholar
  36. 36.
    Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 2002 ACM symposium on Applied computing, pp. 663–667 (March 2002)Google Scholar
  37. 37.
    Schlieder, T., Meüss, H.: Querying and ranking XML documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lingbo Kong
    • 1
  • Shiwei Tang
    • 1
    • 2
  • Dongqing Yang
    • 1
  • Tengjiao Wang
    • 1
  • Jun Gao
    • 1
  1. 1.Department of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.National Laboratory on Machine PerceptionPeking UniversityBeijingChina

Personalised recommendations